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THE RELATION BETWEEN INFORMATION AND 
VARIANCE ANALYSES* 


W. R. GARNER 


THE JOHNS HOPKINS UNIVERSITY 
AND 
Wiuiram J. McGIuu 


MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


Analysis of variance and uncertainty analysis are analogous techniques 
for partitioning variability. In both analyses negative interaction terms 
due to negative covariance terms that. appear when non-orthogonal predictor 
variables are allowed may occur, Uncertainties can be estimated directly 
from variances if the form of distribution is assumed. The decision as to 
which of the techniques to use depends partly on the properties of the cri- 
terion variable. Only uncertainty analysis may be used with a non-metric 
criterion. Since uncertainties are dimensionless (using no metric), however, 
uncertainty analysis has a generality which may make it useful even when 
variances can be computed. 


I. Introduction 


Shannon (5) has defined amount of information by the formula 


Hy) = — & pl® logs ), O 


where y has r discrete values, and p(k) is a probability distribution defined 
over y. In communication theory, y is considered a source of signals, and 
the measure H represents the average number of binary digits required to 
code or store one of the signals. A broader interpretation, however, makes 
H a parameter which measures the non-metric variability of any probability 
distribution. H has a value of zero when the probability is concentrated in a 
single category and is maximum when the probability is uniformly distributed 
over all categories. 

Psychologists have been attracted by the non-metric character of this 
measure and the obvious application to situations where variances cannot 
be computed. Since this use of the measure is concerned only with its statisti- 
cal properties and not with its interpretation in communication theory, we 


*The work of the senior author was phe by Contract N5ori-166, Task Order 
1, between the U.S. Office of Naval Research and The Johns Hopkins University. This 


is Report No. 166-I-192, Project Designation No. NR 145-089, under that contract. 
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shall use the more general term uncertainty, U, to refer to the measure. We 
shall show that uncertainty has many of the properties of variance and can 
be partitioned into components as variance can. 


II. The Analysis Problem 


The relations discussed apply when a criterion is predicted from one 
or more predictors. The development will be presented for the three-variable 
case, where the problem is to determine to what extent values of the criterion 
variable can be predicted from two predictor variables. 

Our notation is as follows: The criterion variable y can assume any 
value y, . The two predictor variables w and zx can assume values w,; or 2; . 
We assume that all three variables are categorized in order that the formulas 
for uncertainty and variance analysis may have equivalent notations. This 
assumption does not limit any of the principles demonstrated. 

In the three-dimensional matrix, n,;, refers to the number of cases in 
a single cell; n;;, refers to the total number of cases having the ith value 
of w and the jth value of x; and n,.. refers to the total number of cases having 
the ith value of w. Similar subscripts indicate other combinations of the 
three variables; n with no subscript indicates the total number of cases in 
the matrix. In analysis of variance formulas, 7 indicates a mean value, and 
the subscript notation just illustrated is used for mean values of the sub- 
classifications. 


III. The Nature of Uncertainty Analysis 


Analysis of variance can be considered as two separate processes. First, 
the variance of the criterion variable is partitioned into its several identifiable 
components—components which add up to the total variance. This process 
is a simple descriptive one; there are no probability assumptions involved 
in its use. One describes the components of a total variance, making no 
assumptions about the distributions from which the data are drawn. The 
second process, which is not a necessary consequence of the first, involves 
using these partitioned components to obtain estimates of population vari- 
ances and to make inferences about the parent population. For this process, 
the actual data provide sample estimates of population distributions; here 
assumptions about the population distributions become critical. 

Uncertainty analysis likewise has both processes. The first process is 
purely descriptive: it is intended to allow the partitioning of the uncertainty 
of the criterion variable U(y) into components. Since this process is entirely 
descriptive, there are no underlying probability assumptions. All that is 
required for its use is that a data matrix of the type described above is avail- 
able. The primary purpose of this paper is to demonstrate the nature of 
uncertainty partitioning and to compare it to variance partitioning. This 
process is illustrated and explained in Table 1. The results of uncertainty 
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partitioning specify sources and magnitudes of variabilities as well as amount 
of categorical discrimination available. These uses are explained more fully 
by Garner and Hake (1) and by McGill (3). 

The n;;, can be considered as sample estimates of p(7, 7, k); we can use 
these sample estimates to test various hypotheses about the parent dis- 
tribution. For example, suppose we wish to test the hypothesis that both 
predictors are independent of the criterion, i.e., 

p(t, j,k) = ple, j)p(h). (2) 

It can be shown (3, 4) by using the likelihood ratio, that when hypo- 
thesis (2) is true, [1.3863 nU(y: w,.#)] is distributed approximately as chi 
square. Independent tests can be constructed in the same way for each of 
the predictors separately as well as for the interaction between predictors. 
The approximation to chi square is of the same order as the familiar chi- 
square contingency test so that, in effect, uncertainty analysis is analysis of 
contingency chi square. Miller and Madow (4) discuss this aspect of un- 
certainty analysis more thoroughly. 


IV. The Orthogonal Case 


Usually in analysis of variance and in uncertainty analysis, the ex- 
perimenter tries to set up orthogonal predictions. Orthogonality is defined 
as zero association between the predictors. This requirement is met when 
the cell frequencies in the matrix of the n,;;, can be predicted correctly from 
the row and column marginal frequencies, i.e., when 


Nyy, = Eh (3) 


nr 


Uncertainty Analysis 
The partitioning of U(y) in uncertainty analysis is illustrated by 

U(y) = Uy: w, x) + U..(y), (4) 
where the uncertainty measures have the definitions given in Table 1. The 
second term on the right-hand side of (4) is the error uncertainty, i.e., the 
amount of uncertainty in the criterion y remaining after the predictable 
uncertainty has been eliminated. The first term on the right-hand side of 
(4) is the predictable uncertainty; it in turn can be partitioned into com- 
ponents 


Uy: w, x) = Uy: w) + Uly: x) + Uly: wz). (5) 


These terms are also defined in Table 1. A feature of uncertainty analysis 
is the interaction term U(y: wz). This is the uncertainty in y predittable from 
unique combinations of w and z. 

Equation (5) describes a process that is identical in form with the 
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partitioning of variance in analysis of variance; in the orthogonal case the 
interaction uncertainty can be interpreted by analogy with interaction 
variance. This is true despite the fact that interaction uncertainties are 
sometimes negative (3). This problem will be discussed in detail in Section V. 


Analysis of Variance 

Uncertainty analysis is generally appropriate when the criterion variable 
y is a categorical variable, i.e., one allowing only nominal scale values (cf. 6). 
The predictor variables may be categorical, or they may be metric variables 
which are categorized for purposes of analysis. If the criterion is a true 
metric variable, i.e., one having at least the properties of an interval scale, 
we can compute variances and perform analysis of variance. The predictor 
variables must be categorized in any simple form of the analysis of variance. 

Equations describing analysis of variance are essentially identical to 
those of uncertainty analysis. The defining equations are given in Table 2; 
except for the fact that variances are computed from squared deviations, 
whereas uncertainties are computed from log-probabilities, the equations 
are identical to those in Table 1. The partition of the variance of the criterion 
can be written: 


Vy) = Vy: w, t) + Vus(y). (6) 


Again the two parts on the right-hand side of the equation are the predictable 
and the error components of the total variance. The predictable variance 
can be broken down as before: 


Vy: w, 2) = V(y: w) + Vy: x) + V(y: we). (7) 


The terms in (7) are explained in detail in Table 2. 

Normally the analysis of variance in (7) is called double classification; 
the variances are generally identified in terms of the two predictors. This 
shorthand procedure is convenient for most purposes. However, it obscures 
the fact that the data array is three-dimensional. The analysis is identical 
to the one treated in uncertainty analysis in every respect, except that in 
the analysis of variance the criterion variable has a metric, whereas it does 
not in uncertainty analysis. 


V. The Non-Orthogonal Case 


In Section IV it was mentioned that the interaction term in uncertainty 
analysis can assume negative values under certain conditions. It is equally 
true that the interaction term in analysis of variance can be negative, if 
it is defined as in Table 2. The negative interaction term is due to non- 
orthogonality and can be thought of as due to a negative covariance term 
that may attenuate or exceed the positive interaction effect. 




















W. R. GARNER AND WILLIAM J. MCGILL 225 


Uncertainty Analysis 


It is not difficult to show that the interaction uncertainty in (5) can be 
written 


U(y: wz) = U,(w: x2) — U(w: 2). (8) 


This form of the interaction term shows at once that interaction cannot 
be negative with orthogonal predictors since orthogonality requires that 
U(w: x) = 0. 

In the non-orthogonal case, however, U(w: x) will be greater than zero. 
With certain combinations of cell frequencies, the contingent uncertainty 
between x and w can be larger than the partial contingent uncertainty— 
resulting in negative interaction. A simple illustration of this principle is 
provided when each value of w is paired uniquely with each value of x. Now 
U(w: x) is as large as it can be. Furthermore, U,(w: x) cannot be greater 
than U(w: x) since U(w: x) is the maximum contingent uncertainty that can 
be obtained from a contingency table involving w and x. Equation (8) shows 
that the interaction will never be greater than zero. An identical result is 
obtained in the variance analysis when the predictors are completely con- 
founded. 


Analysis of Variance 


It is usually assumed that the components of the total variance in 
analysis of variance must be positive. This is true only in the orthogonal 
case; if an analysis of variance is carried out with a non-orthogonal ex- 
perimental design, using the equations given in Table 2, negative interaction 
terms can occur. 

To show how this happens, we now analyze the components of the 
interaction variance for the general case. The equation is 


er 1 2 
Vy: wz) = : i. Gis. —§:..-G.4.+ 9 
GF (9 





2 Ns .N.3.\. ni 2 
— 2 (ny aan Yo... — PO. — p.- 

It can be seen that the interaction variance is composed of two parts: 
the first part is essentially the interaction variance in the orthogonal case; 
the second part is a negative covariance term. This term must be zero in the 
orthogonal case [see equation (3)], but in the non-orthogonal case it cannot 
be ignored. The redundancy introduced by non-orthogonality is illustrated 
clearly in multiple regression. No interaction term is permitted, but a correc- 
tion for non-orthogonality must be introduced whenever the predictor 
variables are correlated (cf. 2). 


a 
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VI. Effects of Non-Orthogonality 


Our discussion of non-orthogonality shows that it is best to design ex- 
periments with orthogonal predictor variables. The analysis is simplified, 
and the uninterpretable interaction components are eliminated. 

Clearly the covariance in (9) is not just part of the interaction variance. 
In fact, when predictors are non-orthogonal, the concept of interaction is 
almost meaningless. For example, consider an analysis of variance in which 
the predictors w and x are completely confounded. The two main-effect 
variances and the interaction will all be identical. The covariance term in 
(9) must be large enough to cancel out two of these variances, but we do 
not know which two of the variances should be cancelled out. In a sense, the 
covariance term is a correction factor which must be applied to the entire 
set of variances. Thus, a covariance term (whether or not it is large enough 
to produce a negative interaction) renders an exact interpretation of the 
component variances impossible. 

The multiple contingent uncertainty or the total predictable variance 
can be computed directly as shown in the defining equations in Tables 1 and 
2. The negative covariance term is included; there is no over-estimation 
of the total predictable variance or uncertainty. However, the interpretation 
of results should be made only in terms of combinations of the two predictors 
—no valid statements can be made about them independently. 

Sometimes it is impossible to obtain orthogonal predictor variables, 
particularly when there are more than two. In time series successive events 
are usually not orthogonally related because no independent control of these 
events is possible. If the time series has serial dependencies, preceding events 
cannot be orthogonal. Consequently, the total predictability of events in 
a time series cannot in general be computed by adding up the separate 
predictabilities obtained from preceding events displayed by one or more 
units in the time series. 


VII. Estimation of Uncertainties from Variances 


It is clear that uncertainty analysis and analysis of variance are analogous 
analytic techniques. In fact, variances may be used to estimate uncertainties 
if we assume that y is normally distributed. 

Shannon (5) has shown that the uncertainty of a normal distribution 
can be specified as 


est U(y) = 3 log, 2reV(y) — log, m, (10) 


where est U(y) is the estimated total uncertainty of the criterion variable 
on the assumption of a normal distribution of values of y, , and where m is 
the width of the category interval on the y continuum. 

We can write similar equations for any of the variances obtained in 
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analysis of variance. For example, 

est U,,.(y) = 3 log, 2xeV,,.(y) — log. m (11) 
is the error uncertainty estimated from error variance. From definition 
(6) in Table 1, and from equations (10) and (11), we can write 

est U(y: w, x) = 3 log [V(y)/V..(y)]. (12) 


Thus, it is relatively simple to estimate the multiple contingent un- 
certainty from the appropriate variances. The expression on the right-hand 
side of this equation is reminiscent of the multiple correlation ratio (7). 
We can, in fact, write 

est U(y: w, x) = —} log, [1 — 7 (y: w, x]. (12-A) 

Estimated uncertainties have the properties of additivity observed in 
computed uncertainties. Consequently, the expression on the right-hand 
side of (12) can be partitioned into three components, each of which is based 
on its equivalent variances as follows: 


est U(y: w) = 3 log, [V(y)/V.(y)], (18) 
est U(y: x) = 3 log. [V(y)/V.(y)], (14) 
est U(y: wx) = 3} loge{[V.(y)- V.(y)I/[V@)- Vu.(y)]}- (15) 


These estimating equations point out some of the differences between 
uncertainty and variance. If (15) is used to estimate the interaction un- 
certainty when the interaction variance is zero, cases can be found in which 
the estimated interaction uncertainty (and the computed interaction un- 
certainty) will not be zero. Converse cases (i.e., zero uncertainty interactions 
with finite variance interactions) can also be found. These apparent con- 
tradictions are due to the fact that variances and uncertainties, while anal- 
ogous, do not measure exactly the same characteristics of probability 
distributions. Uncertainty analysis depends on the number of categories 
occupied by a distribution. Variance analysis depends on the weights or values 
attached to these categories. 


VIII. Application of the Measures 


We have now shown that uncertainty analysis and analysis of variance 
are equivalent in many respects; the question naturally arises as to when 
one should be used in preference to the other. This decision depends on the 
properties of the data and the assumptions the experimenter is willing to 
make. If the criterion variable y has only the properties of a nominal or 
ordinal scale, then only uncertainty analysis is permissible. Uncertainty 
analysis has the greater generality and requires no assumptions about metric 
properties of the criterion. 
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On the other hand, uncertainty analysis does not give any information 
about the metric if it exists. If the criterion variable is metric with at least 
the properties of an interval scale, then analysis of variance must be used 
to retain information about the metric. The variance measure in retaining 
the metric sacrifices generality since the variances obtained from one ex- 
periment are not directly comparable to those obtained from another. Thus, 
the fact that the uncertainty measure is dimensionless gives it a generality 
which allows direct comparison of experimental results which differ in their 
metric. 

To summarize, the measures are similar in many respects, but they are 
not identical. The uncertainty measure has greater generality and the ad- 
vantages of generality. The variance measure is more specific but retains 
information about the metric. The decision as to which to use depends not 
only upon the properties of the criterion variable but also upon the gain 
expected from being more sensitive instead of more general. In many applica- 
tions it is reasonable to use both measures and compare them. 
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MEASUREMENT OF SUBJECTIVE VALUES 
HAROLD GULLIKSEN 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 

Four different value laws are developed and tested by using them to 
predict the scale values of composite stimuli from the scale values of their 
components. These four laws-are: an additive law, a square-root law, a 
logarithmic, and a negative exponential law. They are tried out on a set of 
food preferences by means of Pearson’s Method of False Position. The nega- 
tive exponential law of diminishing returns gave the best fit to the data but 

was not markedly better than any of the other laws. 


The purposé¢ of this study is to show that laws relating subjective value 
to amount of commodity may be studied by an extension of the usual psycho- 
physical scaling methods. Four different value laws will be developed and 
tested with a set of experimental data on food preferences. 

The psychophysical scaling procedures, such as paired comparisons, 
for example, may be used to distinguish between various types of laws 
expressing value increase as a function of increase in amount of the com- 
modity. These procedures are applicable even when no physical measurement 
of the amount of the commodity can be made, and when the scaling procedure 
necessitates measuring from an arbitrary origin. 

Testing each of these laws of value increase also involves a corresponding 
determination of an origin, or point of zero value. Various psychophysical 
methods of determining a zero point have been presented in the literature. 
An additive law of value increase has been used for this purpose (cf. 1, 4, 
and 7). It will be shown here that other laws of value increase may also be 
used to determine an origin or point of zero value. 

The data necessary to test these formulations are obtained by using 
a preference schedule like that illustrated in Table 1. The subject is asked 
the usual paired comparison question, ‘Which do you prefer, 7 or 7?”” How- 
ever, in addition to the single stimuli 7 and 7 composite stimuli of the form 
(¢ and 7) or (g and h) are used. The subject is asked questions of the form, 
“Do you prefer (¢ and 7) or g?” as well as, ‘‘Do you prefer (¢ and j) or (g and 
h)?” (15). 

Do these different stimuli, designated by 7, j, g, h, (¢ and j), (g and h), 
etc., behave as if they are different amounts of some commodity x whose 
subjective value v is given by some function, say v = f(x)? For any given 
function, the experimental device of utilizing the composite stimuli of the 
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type (i and j) may be indicated by writing 

v; = f(x); v; = f(x;); and »,; = f(z; + 2;). (A) 
For example, if the items 7, 7, and (i and j) were bundles of dollars, such 
that the number of dollars in each bundle were known to the subject, but 
not to the experimenter, then the experimenter could raise the question 
asked here. Do these different bundles and their combinations behave as 
if they were different packages containing different numbers of dollars z, 
which were known to the subject and were related to the subjective value 
v by some function v = f(x)? 

It is of interest to note that since 


zi=f"v); 2, =f"@); and 2, +2; = fs), (B) 
we have 
fei) = Fe) + FQ). (C) 
From the viewpoint of psychological scaling, we do not obtain v’s, but 
instead obtain s’s which differ from the v’s by a constant, so we may write 
"ei -9 =f'@-AO+f'G — Oo. (D) 


In other words, if one can find any function, designated f~*, such that an 
additive relationship holds for the proper scale values as indicated then 
one possible solution for the relationship between v and z is the inverse 
f of this function f~’. From this point of view the commodity amounts are 
simply defined by z; = f~*(v;). 

If another function g is found such that 


gv) = 9 '&) + 9°"), (E) 


we may regard it as defining another commodity amount y; = g™'(v,). It is 
of interest to ask about the relationship between z; and y; or between g™ 
and f ’. Substituting (A) in (EB), 


g'f@ +2) = g”'f(x;) + gf (xi). (F) 
A theorem in functional equations states that if 
F(X + Y) = F(X) + F(Y), (G) 
then 
F(X) = aX, (H) 


where a is a constant coefficient. This theorem was proved for continuous 
functions by Cauchy (2). The discontinuous solution for (G) is discussed by 
Hamel (6) and Sierpifiski (11). Applying solution (H) to (F) we have 


g f(x.) = a(z,). 
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Utilizing (B) gives 
gv) = af.) or y = az. 


In other words, the “different”? value laws found in this way would differ 
only in the unit of measurement used for the commodity. It should be noted 
that this statement holds only if equations (B) and (E) hold without error 
for values of »; , v; , and ,;; . 

In order to test different value laws, the theoretical formulation will 
need to show the relationship between the values v; and v,; for the component 
stimuli and the value v;; of the composite stimulus. Let us see what different 
laws of value increase imply with respect to this relationship. 


Four Laws of Value Increase 


Logarithmic Law 


If it is assumed that the rate of change in value »v is inversely pro- 
portional to the amount of the commodity z, we have the differential equation 


dv/dx = k/z. (1) 
Integrating this equation and setting x = x) when v = 0, we have 
v = k log (2/2). (2) 


This derivation of a logarithmic law of value increase was given by Thurstone 
(13). 

What does this logarithmic value law imply with respect to the relation- 
ship between the values of the component and the composite stimuli? In order 
to determine this, we note that 


U3. = k log [(x;/2o) + (x;/Xo)], 
k log (x;/20), (3) 


v; 
and 
k log (x;/2o). 


Eliminating x; and x; among these equations enables us to find an expression 
for the composite value v,; in terms of the values v, and v; of the components. 
Thus, we find that 


0; 


er til® aa e’t/* + ert?* (4) 


These equations do not require the measurement of amount of commodity 
x, but require only a set of value measurements v. However, the usual scaling 
data give neither the v’s nor the z’s but only scale values, s, which differ 
from v by a constant. Let C be the scale value for which v equals zero. 
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By making the substitutions 


v= 5; C,. 

v, = 8, — C, (5) 
and 

v,; = 8; — C, 


let us determine the type of relationship among the scale values of the 
component stimuli and the composite stimuli that is implied by the log- 
arithmic law of value increase. Substituting and simplifying gives 


etti/® = eti/* +4 hd add (6) 


This equation gives the interrelationships among the experimentally de- 
terminable scale values which are implied by the logarithmic law of value 
increase. 

There are some interesting and disconcerting things about this law. 
It contains the parameter k, which is involved in such a way that it cannot 
readily be solved for explicitly. Thus, it remains as an annoying trial param- 
eter in any attempts to verify this equation. It is also interesting to note 
that the equation does not contain C at all. In other words, for the logarithmic 
law of value increase the additive constant C cannot be determined. Any 
additive constant is consistent with the relationships among the component 
and composite scale values. Likewise, any value of 2, is consistent with these 
relationships, since for this law, x) functions as a unit of measurement for 2. 


Square-Root Law 


If it is assumed that the rate of change in value v is inversely proportional 
to the value level already attained, we may write the differential equation 


dv/dx = k/2v. (7) 

Integrating and setting x = x) when v = 0 we have 
v= Vk(x — %). (8) 
This derivation of a square-root law of value increase was given by Thurstone 


(13). 

To determine the implications of this square-root law with respect to 
the relationship between the values of the component and composite stimuli, 
we note that 





= V k(x; — 2») 
vy; = V k(x; — 2) (9) 
ES oe V K(x; + 2; — %). 
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Eliminating x; and x; among these equations enables us to find an 
expression for the composite value v;; in terms of the values v, and v; of the 
components. Thus, we find that 


vy, =v to; t+ ka. (10) 


In order to determine the relationship among the scale values implied by 
this equation, we substitute (5) in (10), giving 


83, — & — 8} = 2C(s;; —8; — 8) + C’?> + ka. (11) 


This equation gives the interrelationship among the experimentally deter- 
minable scale values which is implied by the square-root law of value increase. 
The plot of the quantity on the left side of the equation against the quantity 
in parentheses enables one to find C from the slope of the line, and also to 
find kx) by subtracting the square of half the slope from the intercept. 
Separate values for k and x, cannot be determined. 


Negative Exponential Law 


If it is assumed that the rate of change in value v is directly proportional 
to the difference between the value level already reached and an asymptotic 
value level A, we have the differential equation 


dv/dx = k(A — 0). (12) 
Integrating and setting x = x) when v = 0 gives 
y= A — Ae***e-**, (13) 


This is the familiar negative exponential law of diminishing returns used in 
economics (see, for example, 8 and 12). It has also been suggested by a 
number of writers as an equation of the learning curve (see illustrations 
cited in 3). If we apply it to the component stimuli “2”’ and ‘‘j”’ and also to the 
composite “7 and 7”’, we have 


v,,; =A Ae™e***, 
= A— Ade, (14) 
O; = A — Ae Pte”, 

Eliminating x; and z; to find v;; as a function of v,; and v; gives 


v,, = A — (1/A)e"*"(A — 0,)(A — 9,). (15) 


If v,; is plotted against v;; for a given value of v; , the result is a straight 
line. If this plot is made for each of the values of v; , the result is a family of 
straight lines. We also note that if either v,; or v; is equal to A, then the right- 
hand term of the equation vanishes, giving v;; = A. Thus, the indicated plots 
constitute a pencil of straight lines intersecting in the point (A, A). Again 
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we may substitute for the v’s in terms of the s’s (equations 5), obtaining 
s,; =~ A+C — (1/A)e "(A+ C —8)(A + C — 8)). (16) 


Again, if s; is plotted against s;; for a given value of s; , the result is a straight 
line. If such a plot is made in turn for each of the possible values of s; the 
result is a pencil of straight lines intersecting in the point (A + C, A + C). 
If x) = 0 this series of straight lines may be used to give the values of A and 


of C. 


Linear Law 


If we assume that the rate of change of value is constant, we have the 
differential equation 


dv/dx = k. (17) 
Integrating and setting x = 2) when v = 0, we have 
v = k(x — 2). (18) 


If we solve for the interrelationships among the component and compos- 
ite scale values implied by this equation, we find that 


8; = 8 +8 — C+ ka. (19) 


This relationship has been derived by Thurstone and utilized in an unpub- 
lished study (15). According to this law, the plot of the scale value of the 
composite against the sum of the scale values of the components should be 
linear with unit slope and intercept kr, — C. 


The Food Preference Experiment 


Let us now consider the type of data that is used for these value studies. 
Food preferences were studied. 

Pairs of single items are presented such as Beef vs. Pork, and the subject 
is asked to indicate which he would choose, 7 or 7. Then pairs of what we shall 
term composite items are presented. The subject is asked to choose between 
Beef and Steak vs. Tongue and Lamb. Also the choice is given between single 
and composite items, such as Steak vs. Pork and Tongue. Table 1 shows 
three typical items in the schedule. 

The set of 5 component stimuli and 10 composite stimuli (making 15 
stimuli in all) were presented in a paired comparison schedule to 92 college 
students in a psychology class. The directions stressed that for a composite 
such as Beef and Steak, each is an ordinary sized serving, and that if the 
composite were chosen, the person was to eat both, thus having twice as much 
as if only a single component item were chosen. This was done in order to give 
a better chance for diminishing returns to be manifested in the results. All 
choices involving a duplicate item were omitted, resulting in a matrix of 
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TABLE 1 





Sample Questionnaire Items 


s Roast Rib of Prime Beef 
38. 


Roast Loin of Pork 





Roast Rib of Prime Beef 
Sirloin Steak 


Boiled Smoked Beef Tongue 
Loin Lamb Chop 


39. 


Sirloin Steak 
ho. 
Roast Loin of Pork 

Boiled Smoked Beef Tongue 


LI OU 





incomplete data. A least squares procedure was used (5) for scaling a paired 
comparisons matrix of incomplete data. 

The scale values obtained range from .000 for tongue and pork, the 
least preferred item, on up through 1.043 for lamb, to 2.622 for the composite, 
beef and steak, which was the most preferred item. The complete set of 
scale values is shown in Table 2. 

The fact that these 15 scale values give a good fit (5) to the 55 paired 
comparisons judgments shows that persons can make consistent judgments 
about preferences for composite stimuli along with single stimuli of the type 
used here. Thus, it is experimentally feasible to present in a single schedule 
comparisons of the (7 vs. 7), (¢ vs. g and h), and (z and j vs. g and h) types. 
Any set of concrete objects or even abstract concepts can be dealt with 
according to this pattern. 

In Table 2 we notice that the value of tongue and lamb is higher than 
tongue alone, pork and lamb is higher than pork alone, beef and lamb is 
higher than beef alone, and lamb and steak is higher than stéak alone. Thus, 
the value of any item is increased by forming a composite with lamb. The 
same holds true for beef and steak. Thus, lamb, beef, and steak are all positive 
values. However, now look at pork. The value of pork and steak is lower 
than the value of steak alone, of pork and beef lower than beef alone, of 
pork and lamb lower than lamb alone, and the value of the composite tongue 
and pork is lower than that of either of the components. Thus, from the 
purely ordinal characteristics of the scale, it seems clear that pork has a 
negative value. The same is true to an even greater extent for tongue. The 
zero point is between pork and lamb since pork and tongue are negative 
and lamb, beef, and steak are positive. 
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TABLE 2 
Food Preference Experiment 








Scale 

Stimli Values 
Tongue and Pork -000 
Tongue «137 
Tongue and Lamb +270 
Pork 5K 
Tongue and Beef -830 
Pork and Lamb -928 
Lamb 1.043 


Tongue and Steak 1.088 


Pork and Beef 1.448 
Beef 1.746 
Pork and Steak 1.780 
Lamb and Beef 1.993 
Steak 2.197 
Lamb and Steak 2.32h 
Beef and Steak 2.622 





Generalization to Include Positive and Negative Values 


For this particular set of data there is thus clear evidence that some 
of the component values are positive and others are negative. The linear 
value law extends readily to include both positive and negative values and 
their various combinations. 

The other laws, however, in their previously stated form do not give 
reasonable results for both positive and negative values. However, it is 
possible to make an appropriate extension by having four different rules, 
depending first on whether the components had the same or different signs, 
and second on whether the composite was positive or negative. 

A reasonable interpretation including both negative and positive values 
and their combinations for the square-root law may be made by assuming 

= 0 and writing 


P 9 +9; NE 
- - i — +C 2 
_ |v; + 9; | + 9; | ra . -” 


where v; = 8; — C3v; = 8; — C. 
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This formulation merely says that v; and v; are added if v,; and »; are 
of the same sign and subtracted if v; and v; are of unlike signs. The square 
root of the sum is then given the sign of the larger value. If v; and v; are both 
positive, (20) is equivalent to (10) or (11). If v; and v,; are both negative, 
(20) is equivalent to a reflection into the quadrant where s; , s; , and s;; 
are each negative. If v; and v; are of opposite sign, (20) gives an interpretation 
that is consistent with the previous cases. 

In order to state the logarithmic law for a series of either positive or 
negative values we have analogously with the square-root law: 


ee TH « losl/k UV; losl/k 
§,, = Taal ory k log | e a mre (? + C, (21) 
where v; = s; — C;v; = s; — C. 


Again this formula merely states that e is taken to be a positive power 
for either positive or negative values. The resulting powers are added if 
v; and v; are of the same sign and subtracted if v; and v; have different 
signs. The logarithm is then given the sign of the larger value. In contrast 
to (6), it is now possible to determine C for (21) since the combination of 
negative and positive values is involved. 

The negative exponential law may be restated for the case in which both 
negative and positive values are involved. Let x) = 0; thus 

s - ec =e (wherev =s— C). (22) 
The formulas are easier to work with if we define B as the asymptote ex- 
pressed in terms of the s-scale, just as C is defined: 


B=A+C. 


Let us assume that the positive (upper) asymptote B* may vary independently 
of the negative (lower) asymptote B’. Then let us define 





B” — 8; x; en hlzél 


i= — 
|Bp-—c| [a | 





(23) 


The superscript signs are used to indicate that the positive asymptote B* 
is used if s; > C (i.e., for s*); the negative asymptote B™ is used if s; < C 
(i.e., for s-). The R’s thus defined are positive quantities if s; > C, and 
negative if s; < C. Note that as | x; | increases | R; | decreases. Then 


83; = B;, = (B; om C)aRSR; ’ (24) 


where a = R;R,/| R;R; | and | R; | > | R; |. 

Equation (24) gives a set of computations expressed entirely in s-scale 
values by means of which an estimate §;; of the scale value of a composite 
may be computed from the scale values of the components, assuming a 
negative exponential law of value increase. 
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Method of False Position 


It was also found that the test equations previously developed were not 
sensitive enough to differentiate clearly between the different value laws. 
The method adopted was that of using the component stimuli to predict the 
composite value—selecting the parameters to minimize the sum of the 
squares of the differences between the observed and predicted scale values 
for the composite stimuli. For the linear additive rule, the solution is straight- 
forward. For the others no explicit solution could be found, so the fit was 
made by a successive approximations procedure using Pearson’s Method 
of False Position (10). A brief account of the method presented in matrix 
notation will be given here. This method is a general solution for linear or 
nonlinear equations presented by Karl Pearson. The problem may be stated 
as follows: 

Given the k-parameter function 


Y; = f(m, pena y @ PG Wg g 7 > M , Xi) 
1, 
y. 


together with experimental observations of the paired values x; , y; (¢ = 
- , n), to determine the values of m, so as to minimize : oe (y; — Y; 
Only two restrictive conditions are necessary: 

(1) Given the value of the independent variable x; and a set of arbitrary 
values m,, for each of the parameters, it must be possible to compute (or 
to obtain by mechanical or other means) a corresponding set of values Y,,; 
for the dependent variable. 

(2) The function Y,; = f(mar , me, °** My, °** Myx, t;) must be 
continuous and have continuous derivatives for slight changes in the values 
of the parameters in the neighborhood of the desired solution. 

For a one-parameter function the Law of Mean Value may be stated as 


dY,; _— Yai — Voi 
dm,» mM, — M, ’ 
or 
a ‘ ay 
Y 53 rats ee = (m, vo Mo). 
dmMyo 


For a k-parameter function the general Law of the Mean for functions of 
several variables (cf. 9, p. 121) gives 


s : ~ OY: 
Y,; — Yo; = > Sarg (mi, — M™o,) |, 
g=1 


9 pg 
where 
i is an index for the observations of the dependent and independent 
variables (¢ = 1, --: , n), 
is an index for the parameters in the function (g = 1, --- , k), 
indicates the initial guess for the values of the parameters, 


os 


a 
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h_ indicates subsequent guesses for the value of the parameters (h = 1, 
-++, B), 

p indicates that the value of the partial derivative is taken at a point 
p on the curve between mo, and m,, , 

m,, indicates the hth estimate for the parameter m, . 


Pearson’s derivation is algebraic and quite voluminous. A much briefer 
statement in terms of matrices has been given by Gale Young in an un- 
published note to the writer. This derivation is presented with acknowledg- 
ment to Dr. Young. We may put the derivation in matrix terminology as 
follows: Let 


Y bea matrix of k rows and n columns with elements Y,; — Yo; (hk = 1, 
oe Ti. es ere |e 

M be a square matrix with elements m,, — mo,, where (g = 1, --- , k; 
h=1,---,k). 

F, be a matrix of k rows and n columns with elements dY,;/dm,, . 


Then from the Law of Mean Value 
Y=MF, or MY =F,. 

Thus, we have a means of eliminating matrix F, , for which it would be very 
difficult to find reasonable experimental values. In order for M™' to exist, 
M must be a square matrix; hence for k parameters there must be k + 1 
guesses for each parameter. Thus, both g and h must vary from 1 to k. It 
should also be noted that the partial derivatives in F, are taken at some 
suitable point p between mo, and m,, , selected so that the Law of Mean 
Value holds. Let 


y; (¢ = 1, --- , n) designate the set of observed y values, 
m,, designate the parameter values which give the best fit, 
Y,, designate the corresponding values for the best fitting values of the Y’s. 


We may now define the following row vectors: 


m with k elements m,, — mo, designates the correction needed to change 
the first (or zero-th) guess into the best b guess, 

ce with n elements Y,; — Y,; designates the corresponding changes in the 
calculated Y,’s, 

d with n elements y; — Yo, designates the difference between the zero-th 
approximation and the observed values, and 

e with n elements y; — Y,, designates the error of fit for the best values. 
The problem may now be stated as follows: solve for the vector m in 

terms of Y, M, and d so that ee’ is a minimum. 

From the definition of elements we see that 


e=d-ce; (25) 
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also 
c= mF,, (26) 
where g designates a suitable point on the curve between mp), and n,, . Thus, 
e=d— mF,. (27) 


Selecting m so as to minimize ee’ is the multiple regression problem, which 
is solved as indicated in (16, pp. 173-174). Following this procedure, 


ee’ = dd’ — dF im’ — mF, d' + mF Fim’. (28) 
Differentiating with respect to m and setting the result equal to zero gives 
2mF Fi — 2dFi = 0. (29) 


If the changes in parameter values are slight so that F, is approximately 
equal to F, , then substituting M~’Y for F, and solving for m gives 


m= dY"(YY’)'M. (30) 


Thus, we have an approximation for the correction term m expressed in terms 
of known values of the trial parameters M, Y, and d, the observed and pre- 
dicted values of the dependent variables. 

From the correction term m we can obtain a new vector of trial param- 
eters Mi41),, Which should give a set of predictions Y,,,,;); which is better 
than any of the predictions Y,; to Y,; previously obtained. The new vector 
Mn+1), can be substituted in the matrix M for the vector m,, giving the 
poorest fit, and the resulting set of k + 1 trial parameters used to obtain a 
second m vector by the use of (30). 

This solution exhibits the critical requirements of the Method of False 
Position much more clearly than does the lengthier expression in terms of 
elementary algebra given by Pearson (10). Since YY’ must have an inverse, 
the rank must be k; that is, the different trial values of the parameters must 
be such that the result from one trial is not a linear function of the results 
from other trials. Correspondingly, since Y = MF, the various trial values 
of the parameters must be independent of each other, since YY’ will have a 
rank less than k if the rank of M is less than k. 

Thus, to the requirements stated at the beginning of this section, we 
must now add that the k changes in trial parameters must be linearly in- 
dependent of each other aud must result in a set of linearly independent Y’s. 
For example, changing only one parameter for each set would result in a 
diagonal matrix for M which would clearly have a rank of k. 

The procedure is an iterative one. It may be necessary to apply it 
several times to find a minimum. It also is desirable to check in the vicinity 
of the minimum to be sure that the point found is approximately a minimum. 
Since the function being minimized is a quadratic, it will have only one 
minimum. 

Furthermore, the parameter changes and changes in error must be small 
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enough so that a line, plane, or hyperplane is a good fit to the surface in 
the region being dealt with. Under these conditions the equation Y = MF 
will be a good approximation to the surface. Changing only one parameter 
at a time, so that M is a diagonal matrix, would be a simple method of 
satisfying this requirement. 

The Method of False Position with some of its extensions and limitations 
has been discussed by Willers (17). He also presents other methods for solving 
problems of the type considered here. 


Predicting Scale Values of Composite Stimuli 


The four laws (equations 19, 20, 21, and 24) were used to predict the 
scale values of the composite stimuli from the scale values of their component 
stimuli. 

For the negative exponential law four sets of values were chosen for 
B’, B*, and C. These values were used with (24) to give sets of values for 
§;,; . The parameter values were used to construct the matrix M. The values 
of §,; gave matrix Y. Using s,; and &,;; gave the vector d. Equation (30) is 
then used to find a correction which gives a fifth set of values of the parameters, 
which is better than any of the first four. This process is repeated until a 
minimum is found and tested. For a one-parameter system the process is 
similar but much simpler. To give a measure of goodness of fit we have 
presented the sum of the squares of the discrepancies as well as the sum of 
the absolute values of the discrepancies. These values are shown in Table 3. 

It can be seen that the logarithmic and square-root laws in this case 
give the largest discrepancies between the actual values of the composites 
and the values as predicted from the single stimuli. Therefore, we shall not 
consider either the logarithmic or square-root laws in further detail. 

Both the linear and negative exponential laws placed the zero point 


TABLE 3 


Four Value Laws Compared 














Asymptote Additive 
Pos. Neg. Constant 
+ - a fad 2 
Law 2 or il or ¢ =]8, 5 - 8, £(s, - 8,5) 
a eo Tk 40 
Negative 
Exponential 4.6 -1.8 +0.8 1.069 205 
Linear fo) -0 +0.94 1.658 Ah 
Logarithmic foe) -00 +1.10 2.177 568 
Square Root foe) -© +1.13 2.487 -815 
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between pork and lamb, where previous consideration showed it reasonably 


came. 
Figure 1 shows the test for the linear value increase—the plot of the 
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FIGURE 1 


scale value of the composite against the sum of the scale values of the com- 
ponents. This plot gives a reasonably good fit to the line 


$.; = (8; + 8,) — .9373. 


Thus, the best estimate of the zero point on the assumption of the linear 
law is .9373, or about .94 if we assume that 2) = 0. 

A more detailed analysis showing the fit for the negative exponential 
is shown in Figure 2. Here we have the value of tongue, .137, plotted on the 
abscissa, and over it the value of each of the composites with tongue— 
tongue and pork, tongue and lamb, tongue and beef, and tongue and steak. 
The same has been done for pork, lamb, beef, and steak and their composites 
with each of the other four stimuli. 

The lines show how a negative exponential rule would fit the data, 
given that the zero point is at .8, that the upper asymptote is at 4.6 and the 
lower one at —1.8. We have a family of five lines, the upper line indicating 
the values v,; for all composites with steak, designated S. The lower line 
indicates the values v,; for all composites with tongue, designated T. Corre- 
spondingly, the other three lines show the values for composites with pork, 
lamb, and beef. These lines all converge at the two points (4.6, 4.6) and 
(—1.8, —1.8), corresponding to the two asymptotes. It can be seen that 
the fit is good. 

These data are not adequate to discriminate between the different 
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5 NEGATIVE EXPONENTIAL LAW 
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value laws even though the discrepancies are smallest for the negative 
exponential. More points are needed, particularly more points in the upper 
right quadrant, e.g., where both components are positive and hence the 
composite is positive. Similarly, if one has negative values, there should be 
more of them, so that the negative components would combine to form a 
number of different negative composites. It had been expected that all the 
values in this case would be positive and hence that five components might 
have been adequate. 


Summary 


Four different value laws have been developed: a square-root, a log- 
arithmic, a negative exponential, and an additive law. A method has been 
presented for testing each law using only the scale values of components and 
composite stimuli determined by a psychophysical scaling method. In this 
case paired comparisons was used. A tentative extension of these rules has 
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been made to cover the case in which both negative and positive components 


are combined. 

It has been shown that either the linear or the negative exponential 
law gives a good fit to limited data on food preferences. Also, it should be 
noted that persons do make consistent judgments about preferences for 
composite stimuli of the type used here, so that it is experimentally feasible 
to secure consistent judgments involving (7 and 7 vs. g) or (7 and j vs. g and h) 
in addition to the usual (7 vs. j) type of choice. 

Thus, we have a procedure for investigating the laws governing prefer- 
ences, or value judgments, in areas where there is no readily available method 
for obtaining a physical measure of the amount of the commodity, and in 
which the usual scaling methods do not give a zero point. 
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MAXIMIZING TEST BATTERY PREDICTION WHEN THE 
WEIGHTS ARE REQUIRED TO BE NON-NEGATIVE 


JOSEPH LEV 
NEW YORK STATE DEPARTMENT OF CIVIL SERVICE 


A procedure is developed for computing optimum regression weights 
under the restriction that they be non-negative. The weights maximize, 
subject to the restrictions, the multiple correlation between several predictors 
and a criterion. A numerical example is provided. 


For some purposes weights of subtests of an examination must be 
positive. This is true, for example, in civil service examinations when weights 
are announced in advance of administration of the examinations. Positive 
weights are needed in order to motivate candidates to perform well in the 
subtests. The problem considered in this paper is the computation of weights 
of subtests so as to maximize prediction of a criterion, with the restriction 
that the weights be non-negative. 

The procedure to be described is similar to the method of “steepest 
descent”’ (e.g., 1, p. 47). Actually it is more appropriate to call this a method 
of steepest ascent, since the weights are determined by successive addition of 
increments which tend to increase the multiple correlation between the 
predictors and the criterion. 

An iterative procedure to accomplish the purpose considered in this 
paper was previously described in (2). Advantages of the present method 
are that the computations at each step in the iteration indicate the procedure 
at the next step, the termination of the procedure is clearly indicated, and the 
weights obtained are the best possible. 

In order to write the necessary formulas, the following notation will 
be adopted: 


X> = criterion 

X,; = predictors (¢ = 1, 2, --- , n) 

a; = weights for predictors (¢ = 1, 2, --- , n) 

T = a,X, + a.X, + -- +a,X, = total weighted predictor score 
Ci; = >> (X; — X,)? = sum of squares for X; 
C.; = >> (X; — X,) (X; — X,) = sum of products for X; and X; 
R = correlation between X, and T 


The L;; used by Wherry and Gaylord may be used here in place of the 
C;; . The C,; are smaller numbers than the L;; and are sufficient for the 
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accuracy required. In this notation 
n 

- a;Co; 
1 


VA Oe 4| >. a,a;C;; 
1,j7=1 


In (1) R may be viewed as a function of the variables a; ; the C;; are 
constants. It is convenient to introduce the total differential 





R= (1) 


OR R OR 


a 
dR = 5 da, + 57-day + ++» + 3 





da, . (2) 


Here dR is an increment in R, da; are increments in the weights, and 
0R/da; are partial derivatives. (2) shows how changes in weights influence 
changes in R. If partial derivatives have been computed for fixed values of 
the a; , values of the da; can be selected so as to make dR positive; that is, 
increments can be chosen for the weights so as to cause an increase in R. 
If increments are chosen corresponding to large values of the partial de- 
rivatives, the increase in R to the maximum can be made with few iterations. 
For this reason the present procedure is a method of ‘“‘steepest ascent.” 

For computation of the partial derivatives we note that 


ve “(C., Sant) C. = & Deka, (3) 


where k = )> a;Co;/>.:.; a0; Cj; . 

In the computations to be presented, only relative values of the partial 
derivatives are required. Consequently, only the quantities within the square 
brackets are computed. 

In essence the computation involves repeated trials of increments in 
the weights, da; , so that each trial produces a positive increment, dR, in R 
with the restriction that the a; be positive. Values of the da, selected at 
any trial are determined by the values of the partial derivatives computed 
at the previous trial. The trials are terminated when all of the partial de- 
rivatives are either zero (or nearly zero) or definitely negative. At this stage 
the terminal weights a,; are computed as the sums of the increments tried 
at the various iterations. For those variables which have zero partial de- 
rivatives the terminal weights should be positive or zero. For the variables 
which have negative partial derivatives, the terminal weights should be zero. 

It is evident that a maximum in FR has been reached at the termination 
point described above. For the variables having zero partial derivatives, the 
terminal values of the a; provide the usual maximum in R. For the remaining 
variables an increase in R can be obtained only by use of negative weights. 
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A justification for the procedure will be discussed more fully after the compu- 
tations have been described. 


Computational Procedures 


The computing procedures will be described first symbolically, and then 
in relation to a numerical application. The symbolic development which 
follows is outlined in Table 1. 

1. Record the sums of squares and cross products of the predictor and 
criterion variables as a square matrix. 

2. It is helpful, though not essential, to record below this matrix the 
correlations between the predictors and the criterion. 

3. Below the matrix record the increments corresponding to the iter- 
ations in the appropriate columns. All the increments corresponding to one 
iteration are placed in a single row. The symbol da,; will be used to denote 
the increment for the variable X, in the 7th iteration. 

4. After m iterations the weight a,,; for variable X; is the sum of the 
increments used to this point, 


m 
a = > da;; rs 
cat 


The terminal weight for X; is the a,; after all iterations have been completed. 

5. The first increment is obtained for the variable, X; , which has 
highest correlation with the criterion. It is an approximation of one or two 
digits to the ratio C,;/C;; . Increments for the remaining variables are 
ordinarily, though not necessarily, taken to be zero in the first iteration. 
These increments are also the weights after the first iteration. That is, 
a,; = da,; for each predictor X; . 

6. The computations which lead to the next iteration are carried out in 
columns set up at the right of the matrix of sums of squares and cross products. 

7. In the first column of this set-up the following computations are 
made successively: 


ZL a ;Cy; ’ } a, ;C2; ee ace ba aj;Cnj; , z= a1 ;Co; , 
he De 01: ;Ci; = pay Oo» a,;C;;) (i,j = 1,2, +++ ,n), 
k, = CZ, a&©.)/(2, 2; a,,a,;C;;), 


Col a he Zz a,;Co; . (4) 


Formula (4) provides a measure of the correlation between the weighted 
sum and the criterion, attained as a result of the first iteration. A similar 
quantity is computed at each iteration. 




















fU {Ze 4 oy = 405 


MfZe z Zy - 105 


{1 {etZe ee 





PSYCHOMETRIKA 


(rele zz 








fu, {Te z Ty s 405 Wey Tey 

‘ We Te 

{T5{Te z ly - 1 “ep Dep 
[05!Te z Ty 

rEorle¥le a te “Zep Zep 

{05!'Te z UTep Te» 

(tf TetTe ZZ “x 103 

£05'Te z 00, Wy T05 

(Ul Te z 405 me, TY 

{tofle z Wy, Wy: +++ Tb 

I a aaa = 





i°¢) 
=x 
N 





tT 318vL 


SNOILVESLI OML ONIMOHS “SLHDI3M 30 NOILVINdNOD YO LNO-AVT STIOSWAS 














JOSEPH LEV 249 


The succeeding computations have as their purpose an evaluation of the 
partial derivatives exhibited in (2). We compute 


Co as k, ZL a4;C;; a » Gee sa ky ra a1 ;C,; . 


For the first column some simplification of the above computations is possible. 
However, the saving is slight and the formulas are stated in full generality 
as they apply to all columns. Note that the partial derivative which corre- 
sponds to the variable having positive weight becomes zero, or nearly so. 

8. The increment for the second iteration is based on the values of the 
partial derivatives obtained from the first iteration. Ordinarily we choose 
the partial derivative which has the highest positive value and select an 
increment for the corresponding variable. If this variable is X, , then the 
increment da., is chosen so that 


Coa — ss jC; = 0 


approximately. Here the a,; are the weights after the second iteration. 
Increments for more than one predictor may be chosen at any iteration if this 
seems appropriate. 

9. To perform the computations for the second iteration we take ad- 
vantage of the relationship 


p a2;C;; = bm ai;C;; + ie (daz2;)C;; . 


However, this simplification is not appropriate for the computation of 
z. p>” Q2;a2; C;; . The remaining computations are carried out as described 
for the first iteration. 

10. The iterations are continued until the partial derivatives are 
decidedly negative, or approximately zero, and the quantity kn >> @njCo; 
seems to have attained a maximum. 

Details of a numerical example are shown in Table 2. The first positive 
increment, da,. = 0.6, is an approximation to the quotient 1790/3246. This 
increment is also a,. . The computations based on this increment are shown in 
the column headed I. The first seven entries in this column are products of 
the increment .6 and the cross products under X, as .6 (2802) = 1681, --- , 
.6 (1790) = 1074. The eighth entry is .6 (1948) = 1169; k, is 1074/1169 = 
9187; k, D> a4;Co; = CooR? is given as .9187 (1074) = 986.7. The last six 
entries in this column indicate the relative values of the partial derivatives. 
The greatest of these is the one related to X, and has the value 2520 — 
(.9187) (1654) = 1000. This value of the partial derivative suggests the 
positive increment da », = 0.1 for the second iteration. Computations in the 
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column headed II will be given in full. Notice that 


1681 + .1(1794) = 1860 
1948 + .1(2757) = 2224 
769 + .1(1581) = 927 
970 + .1(2206) = 1191 
342 + .1(—1263) = 216 
1654 + .1(13380) = 2992 


1074 + .1(2520) = 1326 
.6(2224) + .1(2992) 1634 
1326/1634 = .8115 
8115 (1326) = 1076.0 
1254 — (8115) (1860) = —256 


1790 — (.8115) (2224) = —16 
534 — (.8115) (927) = —219 
979 — (.8115) (1191) = 12 

—65 — (.8115) (216) = —240 


2520 — (.8115) (2992) = 90 


Iterations and computations continue as shown in Table 2. At iteration 
5 the process stops because the partial derivatives for variables X, , X, , Xz 
are nearly zero and the remaining partial derivatives are negative. Note also 
the stability of Co.R* over the last three iterations. The result is a multiple 
R of .2994, with regression weights 0, .4723, 0, .0039, 0, .0905. 

All the increments tried in this numerical example have been positive. 
In some circumstances negative increments may be tried. This is true when a 
positive weight has been too large as shown by a negative partial derivative 
for a variable having positive weight. 

An evaluation of the method is called for at this point. One may ask 
whether the weights obtained by this method actually provide the highest 
possible R under the restriction of non-negative weights. Before dealing 
with this question it is necessary to consider whether there exists a set of 
positive values a; which yield a maximum for R. 

One way of demonstrating that a maximum actually exists is to note 
first that under ordinary conditions a set of weights maximizing R can be 
found for any selection of variables out of the n given variables. Some of the 
selections of variables will give a maximum F with non-negative weights. 
There are a finite number of these selections. Consequently, we may choose 
that one set of variables which provides the largest maximum FR under the 
required condition. Weights obtained in this way are the desired weights. 
Thus, the existence of a maximum is demonstrated. In exceptional circum- 
stances it may happen that all weights are negative. This will be shown by 
the computation. 

The existence of a maximum F# under the condition of non-negative 
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weights having been established, it is necessary to demonstrate that the 
procedure described in this paper actually leads to this maximum. This will 
be demonstrated with the aid of a theorem. 

The necessary and sufficient condition for the set of weights a, , a2, +--+ , Gn 
to maximize R, with the provision that none of the weights be negative, is that 
the partial derivatives 0R/da; be zero for variables X; with positive weights, 
and negative or zero for variables X; with zero weights. 

Necessity: This has already been discussed. If a set of non-negative 
values of the a; maximizing R has been attained then the condition implies 
that R can be further increased only by changing from zero to negative some 
of the a; , for variables having negative partial derivatives. Any change in 
values of the a; for variables which have zero partial derivatives will decrease 
R. Thus necessity is demonstrated. 

Sufficiency: To demonstrate sufficiency, suppose that the procedure of 
this paper has provided a set of a; satisfying the conditions of the theorem 
with corresponding R. Suppose now that there is another set of values of the 
a; also satisfying the conditions of the theorem with a value of R’ > R. 
But this is impossible, for some of the partial derivatives for the first set of 
a; should then be positive. Thus sufficiently is also demonstrated. 

Since the procedure described in this paper leads to zero or negative 
partial derivatives, the procedure provides a unique maximum under the 
restriction to non-negative weights. 
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THE MATCHING PROBLEM* 
EpGar J. GILBERT 


HARVARD UNIVERSITY f 


Tables of the exact distributions of number of matches are given for 
small decks having the same number of cards in each suit. Several approxi- 
mate distributions are considered for use with larger decks, and some indi- 
cation of the goodness of the approximations is given. 


Many psychological experiments involve testing the ability of a person 
(or a method) to classify certain objects into categories. The matching- 
problem technique to be described in this paper provides a mathematical 
model which may be used to calculate significance levels for such experiments. 

The example of the matching problem which is most often cited is the 
work done in certain of the early ESP (extra-sensory perception) experiments. 
A deck of 25 cards containing 5 each of 5 different figures (circle, cross, wave, 
square, and star) was shuffled into a random order by the experimenter. The 
subject was given a second deck of the same composition and asked to arrange 
it in the same order as the hidden deck of the experimenter. Then the two 
decks were compared. If the first card of the subject’s deck was of the same 
kind (circle, cross, etc.) as the first card of the experimenter’s deck, the 
subject scored a “match.” Then the second card in each deck was examined, 
and so on through the two decks. The subject’s ability was scored according 
to the total number of correct matches, as compared to the number to be 
expected by chance alone. 

For another example, suppose a handwriting expert claims he can tell a 
person’s profession by examining a sample of his handwriting. To test his 
ability he is given 10 samples, 2 written by doctors, 4 by lawyers, and 4 by 
teachers. He is told which professions are represented but not how many 
samples are from each profession. If we number the samples from 1 to 10, 
placing under each number the true profession and then the expert’s guess, 
we might get the following result: 


Sample Number: 1 2 3 4 6 7 8 9 10 
True Profession: L D L T 7 @ 2 XS 
Expert’s Guess: L D T T DT DL D D 


*The calculations for this paper were done while the writer was working on a project 
sponsored by funds from the Ford Foundation. The writer wishes to nae Pere his gratitude 
“ ah Frederick Mosteller for his constant help and encouragement during the writing 
of this paper. 

tNow at University of California, Berkeley. 
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In order to evaluate the expert’s ability on the basis of this test, we 
might think of the two rows “True Profession’ and ‘Expert’s Guess’’ as 
being the arrangements of a target deck and a call deck, respectively, and 
imagine that the expert was attempting a kind of matching problem similar 
to that of the ESP experiment described above. We notice that he called 
correctly cards numbered 1, 2, 4, 5, and 6, for a total of 5 correct matches. 
However, we see that this case is different from the ESP case in that the 
deck used by the expert does not have the same composition as that used 
by the experimenter; so we must allow for this in calculating the probability 
of his getting the 5 matches by chance. The probabilities for this particular 
example will be worked out in the last section of the paper, but first the 
more easily calculated cases where both decks are identical will be considered. 

In general, the model will consist of two decks of cards: a target deck, 
the suits of which correspond to the kinds of objects being classified, and a 
call deck whose suits correspond to the categories used by the subject in his 
classification. We imagine that the cards in the target deck have some arbi- 
trary arrangement and calculate the probability distribution of the number 
of correct matches between cards in the two decks on the assumption that 
the call deck was arranged by random shuffling. If the subject has any ability, 
scores as high as or higher than his will have low probability. 

Several variations of the problem arise, depending on the composition 
of the call deck. If the person to be tested is informed of the categories to be 
used and the number of objects in each category, and is forced to use this 
information, then the call deck is identical with the target deck in composition. 
If he is not so informed or not forced to use the information, he may classify 
the objects in such a way that his call deck is different from the target deck. 
In this case, his ability to choose categories is not perfect. In order to use 
the matching problem technique, the calculation of the probability of various 
numbers of matches must be based on the composition of the call deck 
actually used. It is always possible to assume that both decks have the 
same suits, however, by using the trick of saying a suit which is not repre- 
sented actually is present, but with zero cards in it. 

In this paper, tables of the exact distributions of numbers of matches 
for small decks having the same number of cards in each suit will be given. 
Several methods of obtaining approximate distributions for larger decks will 
be indicated. The problems of decks with different numbers of cards in different 
suits, and of non-identical decks, will be discussed. 


Exact Distributions 


The simplest form of the matching problem is the case where both the 
target deck and the call deck have n distinct cards, i.e., where any given 
card of the call deck can match one and only one card in the target deck. 
This is a special form of a problem which has a long history in the mathe- 
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matical literature. For example, Feller (3) gives a formula for calculating 
the probability distribution of the number of matches in this case and gives 
tables of the calculated values for decks of size 3, 4, 5, 6, and 10. He shows 
that for deck sizes larger than 10, the values are, to five decimal places, the 
same as for the Poisson distribution with unit mean. 

A special case of the matching problem arises when there are only two 
suits in each deck and each suit has c cards in it, making a total of 2c cards 
in the deck. There is no possible arrangement of the call deck which will 
give an odd number of matches, while the number of arrangements giving 


2 
h matches, when h is an even integer, is a , the square of the number of 


combinations of ¢ things taken h/2 at a time. Since this is the square of a 
binomial coefficient, it is a simple matter to look up the binomial coefficients 
corresponding to h = 0, 2, 4, --- 2c, square them, and add the results to 
get the total number of ways of arranging the deck. Then divide the number 
of arrangements which give h matches by the total number of arrangements 
to get the probability of exactly h matches. For example, if each suit has 2 


2 2 
cards in it, then there are () = 1 way of getting 0 matches, (?) = 4 ways 


2 
4 = 1 way of getting 4 matches. So the probability 


of exactly h matches when h = 0, 1, 2, 3, 4 is 1/6, 0, 4/6, 0, 1/6. 

The next easiest case to calculate is that of two identical decks, each 
having s suits of c cards per suit. It is this situation, where the probability 
of exactly h matches is a function of the three numbers: s, the number of 
suits; c, the number of cards in each suit; and h, the number of matches, 
which will be the primary concern in this paper. Denote this probability by 
m(s, c; h) and write the probability of h or more matches (which is the sum 
of the probabilities for h, h + 1, h + 2, ete.) as M(s, c; h). It is this latter 
probability which is usually wanted in finding the significance of experimental 
results. We usually want to know the probability of a subject’s doing as well 
as or better than his result by chance alone. 

Even in this case, few tables have been published giving the exact 
probabilities. Huntington (8) gives tables of m(3, 3; h) and m(4, 4; h). Greville 
(5) gives a table of m(5, 5; h), the probabilities for the ESP experiment 
referred to earlier. Greenwood (4) gives a table of estimated values of m(4, 13; 
h) for values of h from 0 to 7, where both decks are the ordinary 52-card 
bridge decks. There may be other tables available in the literature; but it 
was thought that a small collection of tables in one place might be useful. 

Greville (6) derived a formula for the matching problem distribution 
which seemed to the writer to be more adapted to calculation of exact proba- 
bilities than some others in the literature. His formula was for a more general 
case than that we are now considering and will be used in its general form in 


of getting 2 matches, and ( 
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the last section of the paper. The following special case of Greville’s formula 
was used to calculate the values given in Table 1: (n = sc = size of deck) 


mes,e;h) = 2 (-y(ihn — oun, (1) 


where H; is the coefficient of x‘ in the expansion of 


< elc!a’ . 
(es oer (2) 





Since it is usually the probability of h or more matches which is wanted 
in checking experimental data, Table 1 lists M(s, c; h) = >0?%-, m(s, c; 7). 
The calculations were carried out to exact integers before dividing by n!, 
the divisions made to 6 decimals, and the values of M(s, c; h) rounded to 
five decimals after summing. The table is arranged first by groups in order of 
increasing c, then within groups in order of increasing s. Within each group 
it will be noticed that the distributions seem to approach a limiting distri- 
bution—the Poisson—but that the approach is slower for groups with 
larger c. It will also be noticed that the first distribution in each group [that 
for M(2, c; h)| has the entry for odd h equal to the entry for the following 
even h. This is due to the fact mentioned earlier, that the probability in this 
case of exactly h matches, when h is odd, is zero. 

The values M(5, 5; h) given in Table 1 were obtained by summing to 
five decimals the values given to 7 decimals by Greville (5). All other values 
in Table 1 were calculated independently by the writer, using the tables of 
Feller and Huntington as a check where there were duplications. 


Approximate Distributions 

Since the calculation of exact probabilities gets very tedious for large 
decks, it would be desirable to be able to use distributions which either are 
already available, or are easier to calculate, to approximate significance 
levels for the number of matches. As has already been mentioned, Feller’s 
tables (3) show that a Poisson distribution is satisfactory for the approxi- 
mation of m(s, 1; h), when s is only moderately large. A study of the dis- 
tributions within each group with the same value of c (Table 1) leads one to 
suspect that this is true also of groups other than c = 1; but as c gets large, 
the rate of approach to Poisson with increasing s is much slower. 

In order to find more accurate approximations, we need to know more 
about the moments of the matching problem distribution. We shall continue 
throughout this section to work with the relatively simple case of identical 
decks with the same number of cards in each suit. For this case, the first 
four moments were derived by means of a rather ingenious use of determi- 
nants by Olds (9). If v is the mean and yu, the 7th moment about the mean, 
































EDGAR J. GILBERT 259 


then in terms of c, the number of cards per suit, and n = sc, the size of the 
deck, we have 


yv=c 
M2 = e(n — c)/(n — 1) 
us = c(n — c)(n — 2c)/(n — 1)(n — 2) 


(3) 





= c(n — c) 
me fa — De = Se — 8 [(n — 2c)(n — 3c)(8c + 1) 


+ (c — 1)(12ne — n — 18c” — 6c)]. 


Anderson (1) shows that as the deck size is increased while keeping the 
proportion of cards in each suit fixed, the number of matches is asymptotically 
normally distributed. So if n is large enough, the mean and variance of the 
matching problem distribution can be calculated from formulas (3), and the 
desired values found in a table of the normal distribution. Unfortunately, it 
is difficult to say precisely when n is large enough. In many cases, the match- 
ing problem distribution is asymmetrical; to the extent that this is so, the 
normal will give a bad fit. Although the normal is not as close an approximation 
as some of the distributions considered later in this section, it can be seen 
from the examples in Table 2 that it is sufficiently close for some purposes; 
it has the advantage of being available without too much calculation. An 
example of the calculations involved in fitting a normal to a matching problem 
is worked out in the last section of this paper. 

Hamilton (7) noted that the matching problem distribution was some- 
what like a binomial distribution with p = 1/s and n = sc, and suggested 
that writing the mean of the matching problem as np and variance as 
npq{n/(n — 1)] made the similarity apparent. It can be seen that the binomial 
with p = 1/s has the same mean as the matching problem, and that the 
ratio of the two variances approaches | as n gets large. This is also the case 
with the third moment about the mean. It is npq(2q¢ — 1) for the binomial, 
and we can rearrange the formula for u; in (3) above—substituting np for 
c where it appears and setting g = 1 — p—to make the third moment of the 
matching problem look like npq(2q — 1)n?/(n — 1) (n — 2). 

The binomial is suggested as a good approximating distribution for 
another reason—extensive tables of the binomial are available (11). Since 
the binomial is a two-parameter distribution, we can in many cases find a 
tabled distribution which has both mean and variance equal to that of the 
matching problem distribution. In this case, we shall not look up the same 
values of n and p used in the last paragraph; we shall treat n’ and p’ simply 
as two parameters, using the prime to show that these are different numbers. 
If we replace n in (3) by sc and equate the means and variances of the match- 
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TABLE 2 








Approximate Distributions 


G-C is Gram-Charlier approximation; G-CB is the binomial modi- 


fication of Gram-Charlier. 


Values of M(4,13;h) and its 2d order 


G-C are taken from Greenwood (4), summed to four decimal places. 
The binomial approximation to M(4,13;h) was chosen by method of 
text to have same mean and variance as exact distribution. 





























Exact Poisson Binomial 4th Order 
h M(8, 2;h) ws 2 n=25;p*. 08 G-C 
0 1. 00000 1, 00000 1. 00000 1. 00000 
] . 87335 . 86466 . 87564 . 87330 
2 . 60334 - 59399 . 60528 . 60340 
3 . 32369 ¢aesae oSeaa7 . 32361 
4 . 13676 . 14288 . 13509 . 13671 
5 . 04640 . 05265 . 04514 . 04643 
6 . 01285 . 01656 . 01229 . 01287 
7 . 00295 . 00435 . 00277 . 00295 
8 . 00056 . 00110 . 00052 . 00055 
g . 00009 . 00024 . 00008 . 00009 
10 . 00001 . 00005 . 00001 . 00001 
Exact Poisson Binomial 4th Order 
h M(8, 3;h) YW «3 n=30;ps. 10 G-CB 
0 1. 00000 1. 00000 1. 00000 1. 00000 
1 - 95645 - 95021 - 95761 - 95646 
2 . 81402 . 80085 . 81630 . 81400 
3 . 58715 . 57681 . 58865 «58712 
4 . 35295 ‘35277 . 35256 . 35295 
5 . 17702 . 18474 . 17549 . 17704 
6 . 07467 . 08392 . 07319 . 07468 
7 . 02674 . 03351 . 02583 . 02674 
8 . 00820 .01191 . 00778 . 00820 
9 . 00217 . 00380 . 00202 . 00217 
10 . 00050 . 00110 . 00045 . 00050 
i] . 00010 . 00029 . 00009 . 00010 
12 «00002 © . 00007 . 00002 . 00002 
ini aie Exac t ey 2nd Orde ¥ Binomial 
h M(4, 13;h) G-C n®68;p=1/17 Normal 
0 1. 00000 1. 00000 1. 00000 . 990 
l - 9838 . 9839 . 9839 . 904 
2 . 9149 . 9149 . 9149 . 902 
3 . 7707 . 7705 . 7708 . 780 
4 . 5725 -Ste3 . 5726 . 602 
5 era rs ey vA .a7a3 . 398 
6 . 2099 . 2103 . 2102 .210 
7 .1047_——. 1045 1045 098 














ing problem distribution and the binomial, 


a pair of simultaneous equations which we can solve for n’ and p’ in terms of 


a 
v=cC=n'p 


ira c(s 


l)/(se — 1) = n’p'(1 — p’), 
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s and c. For solutions 
n’ = c(se — 1)/(e — 1) (4) 


p’ = (c — 1)/(se — 1). 


Note that n’ is not the size of the deck but is just a parameter for which 
we have solved. It may or may not be an integer. If it is, and p’ is a value 
which can be found in a table, then the binomial distribution with parameters 
n’ and p’ will have the same mean and same variance as the matching problem 
distribution with s suits of c cards each. If n’ is not an integer, the integer 
nearest to n’ can be used, call it n’’, and using p” = c/n” gives a binomial 
with the correct mean and very nearly the correct variance. The writer 
would guess, based on trying a few cases, that for decks larger than 25, one 
should be able to fit a binomial by this method to about three decimal places. 
Table 2 shows some examples of the use of the binomial as an approximation. 

This approach suggests looking for known, or easily calculated, distri- 
butions which have the first two, three, or more moments in common with 
the matching problem distribution. Greenwood (4), following a suggestion 
by Mantel, used the Gram-Charlier series type B as an approximation. The 
Gram-Charlier uses the Poisson as a first approximation to the desired 
distribution; suitable multiples of the first, second, and successive differences 
of the Poisson are added to ccrrect for the mean, second moment, etc. If 
p(h) is the Poisson probability of h, then the first difference, Ap(h), is defined 
to be p(h) — p(h — 1). The second difference, A’p(h), is the first difference of 
the first difference, i.e., Ap(h) — Ap(h — 1), and so on for higher differences. 
The reader will note that Ap(0) is not well defined yet, since we must know 
what value to use for p(—1)—this is not ordinarily found in a table of the 
Poisson distribution. For purposes of the Gram-Charlier series, p(h) is defined 
and equal to zero for all negative values of h. We are now ready to define 
the Gram-Charlier series. Where m(h) is the distribution being approximated, 

m(h) = p(h) + a,Ap(h) + azA*p(h) + a,A*p(h) + ---. (5) 

In principle, this is an infinite series; in practice, only the first few terms 
are used. The constants a; appearing in (5) are determined by the moments 
of the matching problem and of the particular Poisson distribution used. 
The Poisson is usually chosen so that it has the same mean as the distribution 
being approximated; this makes the value of a, zero. If u; is the 7th moment 
about the mean of the matching problem distribution and m; the correspond- 
ing moment for the Poisson, the next three constants for the Gram-Charlier 
type B are given by 


2a, = f2 — M 
6a; = 6a, — (us — ms) (6) 
24a, = 36a, — 2a,(6m, + 7) + (uy — m,). 
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As an example, suppose we wish to find an approximate value for 
m(7, 2; 3), the probability of exactly three matches when both decks have 
seven suits of two cards per suit. (From Table 1 we see that the correct value 
is .32383 — .13580 = 0.18803.) As a first approximation, we look up the 
Poisson with mean 2, and find p(3) = 0.18045. This value is 0.00758 too low. 
For the next approximation, we need to know the variances and the value 
of A*p(3): 


Be = c(n — c)/(n — 1) = 2(14 — 2)/(14 — 1) = 24/13 


i, c= 2 

a, = 3(24/13 — 2) = —1/13 
Ap(3) = p(3) — p(2) = 0.18045 — 0.27067 = —0.09022 
Ap(2) = p(2) — p(l) = 0.27067 — 0.27067 = 0.00000 


A’*p(3) = Ap(3) — Ap(2) = —0.09022 
a,A*p(3) = (—1/13)(—0.09022) = 0.00694. 


Therefore, we have as a second approximation p(3) + a,A’p(3) = 0.18739, 
which is only 0.00064 too low. If we wished a still closer approximation, 
we could calculate the terms involving a; and a, . An example of a fourth- 
order approximation is given in Table 2. 

If the bionomial is used as a first approximation, higher-order approxi- 
mations may be obtained by a modification of the Gram-Charlier series 
which uses the binomial instead of the Poisson. The formulas (5) and (6) 
are valid for this modification, provided that we re-interpret p(h) and m; to 
be the probability and moments of the binomial. It has been the writer’s 
experience that the Poisson works well as a starting distribution when c is 
2 or 3, and that the binomial gives a better fitting curve when c is more than 
3. For decks of 24 or more cards, when all suits have the same number of 
cards, truncating the Gram-Charlier series (or the binomial modification) 
after terms which involve a, gives an approximation correct to about 0.00003 
or less. For a more detailed description of the Gram-Charlier series and a 
derivation of the constants involved see Rietz (10). 


More Complicated Decks 


If the target deck does not have the same composition as the call deck, 
or if the suits do not all have the same number of cards, the estimation of 
significance levels may be more complicated. In many cases, the first compli- 
cation can be avoided by the simple device of making sure the person who is 
attempting the classification knows what the categories are and how many 
objects are in each category. When it is possible to design the experiment 
so that this is done, several advantages are gained. The calculations for each 
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subject or trial are simpler for identical decks. Also, if each subject selects 
his own categories, there may be a different set of calculations necessary for 
each subject; whereas if all subjects use the same deck, only one distribution 
need be calculated. Finally, while the writer knows of no investigations of 
the power of this type of test, it seems clear that the subject has a better 
chance of showing his ability, if he has any, when both decks are the same. 
Since, in using the matching problem technique, the calculations are based 
on the decks actually used, there is no way to give credit for choosing the 
proper categories. 

The calculations are also somewhat more complicated if different cate- 
gories contain different numbers of objects. However, in many cases this 
is not a part of the experimental design which can be changed. For small 
decks, the exact distribution may have to be calculated. If so, the following 
procedure, taken from Greville (6), may be used. (This also works for calculat- 
ing non-identical decks.) Suppose the two decks have the following compo- 
sition: 














| Suit . + 2 « + | Total | 
| Cards Call Deck m Me m3 Ms n | (7) 
Per Suit Target Deck ny N2 Ns ne | 1 

| | 
Let M; be the smaller of m; and n; for each 7 = 1, 2, --- , s. Then the prob- 
ability of exactly h matches, m(h), is given by 

lec oer 
m(h) = aI p » (—1)’ (iin — j)!H; , (8) 
+ jah 


where H; is the coefficient of x’ in the expansion of 


8 M; m ; In ; Iy* | 
I be k\(m, — k)\(n; ease k)! . 


As an example, suppose we wish to calculate the distribution of matches 
in the case of the handwriting expert given at the beginning of the paper. 
The two decks have the composition: 





Suit l 2 3 | Total 
Call Deck 5 2 3 
Target Deck 2 4 4 





Cards (9) 
Per Suit 








We notice one peculiar feature of non-identical decks at this point. Although 
both decks have 10 cards each, only 7 matches are possible: 2 in the first 
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suit, 2 in the second, and 3 in the third. Thus, we will expect to get non-zero 
values of m(h) only for h = 0, 1, --- , 7. 
First expand the generating function (9) to get H; : 
(1 + 10x + 202x7)(1 + 8a + 12z”)(1 + 12% + 362” + 242°) 
= 1+ 30x + 3642” + 22962° + 80642* + 15648x° + 153602° + 57602’. 
Then calculate the factor Q; = (n — j)!H; for each j = 0,1, --- , 7. 


Next calculate n!m(h) = % (-n'*(J)a, , for each h = 0,1, -:: , 7, 
i=h 


where (3 ) is a binomial coefficient, which can be obtained from tables. 


With a table of binomial coefficients and the values of Q; , which we have 
calculated, this sum can be obtained rather quickly on a desk calculator. 
Finally, divide each value by n! to get m(h), the probability of exactly h 
matches. In order to get the probability of h or more matches, add bn 
m(i), getting the values listed in Table 3. 

TABLE 3 


Handwriting Expert Example 











h Exact Normal Binomial 
0 1, 00000 0.9922 1.00000 
1 - 96984 - 9582 9718 

2 - 84762 - 8508 - 8507 

3 - 62064 - 6368 - 6172 

4 2 35556 - 3632 - 3504 

5 - 15238 . 1492 - 1503 

6 - 04444 . 0418 0474 

7 - 00952 . 0078 . 0106 





To approximate this distribution, using either the normal or the binomial, 
first calculate the mean and variance. For this purpose use a formula derived 
by Battin (2). (Battin gives a brief review of the mathematical literature on 
the matching problem prior to 1942 and lists an extensive bibilography.) 
He uses the technique of generating functions to get formulas applicable to 
decks of arbitrary composition. He also extends the matching problem to 
more than two decks, with the possibilities of two-card matches, three-card 
matches, etc. Here, only his results for two decks are used. 
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If the two decks have the composition specified in (7) above, then the 
mean number of matches, v, and the variance, o”, are given by: 


y= i z. MN; 
. (10) 


o = a (> mn)? — n YS mandm, +n) +n? YS maj]. 

n(n — 1) 
These formulas have been rearranged slightly from the form in which Battin 
listed them, for convenience in calculating. The products m,n; and m,n; 
(m; + n,) can be obtained quickly from the table (7) giving the composition 
of the deck, and the rest of the calculation is straightforward. We shall outline 
the calculations for the handwriting example. 











t 1 2 3 Total 
mM; 5 2 3 10 
ns 2 cS 4 10 
mink 10 8 12 30 
Mm +n 7 6 7 
mins(m; + ni) 70 48 84 202 
vy = 30/10 = 3 


? = [1/(100)(9)][(30)* — 10(202) + 100(30)] 
o” = 1880/900 = 2.0889 
o = 2.0889 = 1.45. 


Since the normal is a continuous distribution, while the matching 
problem distribution is discrete with probability “concentrated” on the 
integers, we must make a correction when using the normal as an approxi- 
mation. That is, instead of the probability that X = h, we want the prob- 
ability that X is between h — 1/2 and h + 1/2; if we want the probability 
of h or more matches, we must find Prob(X > h — 1/2), where X is normally 
distributed with mean 3 and standard deviation 1.45. (In the special case 
where each deck has only two suits, the probability is concentrated on the 
even integers, and we must subtract 1 instead of 1/2 in order to get the 
proper correction.) Since we have available a table of the distribution of 
Y, a normal variate with zero mean and unit standard deviation, we look up 


h-}- 3) 
Prob{ ¥ > 1.45 ‘ 


Q 
Il 


The values found for h = 0, --- , 7 are listed in Table 3. 

The last entry in Table 3 gives the binomial approximation to our 
matching problem. The binomial which came closest to fitting was that for 
which n = 10 and p = .3, which has mean 3 and variance 2.10. 
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It is shown that Estes’ formula for the asymptotic behavior of a subject 
under conditions of partial reinforcement can be derived from the assumption 
that the subject is behaving rationally in a certain game-theoretic sense and 
attempting to minimax his regret. This result illustrates the need for specifying 
the frame of reference or set of the subject when using the assumption of 
rationality to predict his behavior. 


Learning theory and game theory (together with the closely related 
statistical decision theory) purport to provide theories of rational behavior. 
Implicit in any theory of learning is a motivational assumption that learning 
consists in the acquisition of a pattern of behavior appropriate to goal 
achievement, need reduction, or the like. In parallel fashion game theory and 
statistical decision theory are concerned with discovering the course of action 
in a particular situation that will optimize the attainment of some objective 
pay-off. 

In order to gain a better understanding of the concepts of rationality 
underlying these two bodies of theory, it would be interesting to construct a 
situation in which predictions made from these theories could be compared 
and then checked against experimental data on actual behavior. One situation 
of this kind received considerable attention at the Santa Monica Conference 
on Decision Processes (2, 3, 4). The experiment is one involving partial re- 
inforcement. At each trial the subject chooses between two alternatives. 
Each alternative is rewarded on a certain per cent of the trials in which it is 
chosen (the trials rewarded being randomly determined) and not rewarded 
on the remaining trials in which it is chosen; the per cent of rewarded trials 
is in general different for the two alternatives. The learning theory advanced 
by Estes provides a prediction as to the frequency (in the limit as the number 
of trials increases) with which the first alternative will be chosen in preference 
to the second (2). The same frequency is predicted by the Bush-Mosteller 
theory when certain assumptions of symmetry are made with respect to the 
parameters that appear in their model (1, ch. 8). Estes reports several 
experiments that confirm predictions from his theory. 

When this experimental situation was described to a number of game 
theorists at the Santa Monica conference, they pointed out that a rational 
individual would first estimate, by experimenting, which of the two alterna- 
tives had the greatest probability of reward, and would subsequently always 
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select that alternative which would not be predicted by the Estes theory. 
Flood has defended the choices predicted by the Estes theory against the 
charge of irrationality, basing his defense on two points (4, p. 288): 


(a) The proper definition of payoff utilities would be unclear in attempts to apply 
game-theoretic arguments to a real case, and there is a reasonable payoff matrix that would 
rationalize the reported behavéor. Thus, if the organism’s object were to maximize its score 
rather than its expectation, then it should sometimes not tend to use a pure strategy... 

(b) The von Neumann-Morgenstern game theory is inapplicable in this situation 
unless the organism can assume safely that the experimental stimulus is generated by a 
stationary stochastic process. For example, if the organism believes that there may be some 
pattern (non-stationarity) over time, in the stimulus, then it can often do better by using a 
mixed strategy rather than a pure one, for the latter would give it no way to discover any 
pattern effect. 


In the next section, by combining in an appropriate fashion the two 
considerations advanced by Flood—that is, by assuming (a) the subject is 
trying to maximize something other than expected payoff, and (b) the subject 
does not believe or expect that the probability of reward from each alternative 
is fixed—it will be shown that the behavior predicted by the Estes theory 
and actually observed in experiments is rational in the sense of game theory 
(or at least in one of the many senses consistent with game theory). In a 
final section, the implications of this result will be discussed briefly. 


Game-Theoretical Derivation of Estes’ Result 


Consider a partial reinforcement experiment in which there are two 
alternatives of behavior, A, and A, . If A, is chosen on a particular trial, 
it is rewarded with probability 7, ; if A, is chosen, it is rewarded with prob- 
ability 2. . Let p,(t) be the probability that the subject chooses A, on the 
tth trial; p(t) = (1 — p,) the probability that he chooses A, . From the 
postulates of his learning model, Estes (2) predicts that the asymptotic 
value of p, as the number of trials increases will be p4 , 


ees 1 — ; 
mi Goa) +0) si 





This value for p¥ may be obtained as the steady state of the stochastic 
process m 


Het) = mp0, where m= ( i wie | (2) 


(1 — m,) To 
for 


pilt + 1) = mpi(t) + (1 — m)[1 — p,(d)], (3) 
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so that, if pi(é + 1) = p,(t) = pt, 
(1 — 7, + 1 — m,)pt = (1 — m2), (4) 


from which (1) follows immediately. 

We see that in Estes’ theory 7, is the probability that A, will be rewarded; 
but it is also the asymptotic probability that, having chosen A, on a given 
trial, the subject will choose it again on the next succeeding trial. A similar 
interpretation can be given to 7, . Hence, we may interpret 7, and 72 as 
the conditional probabilities of persistent behavior when the subject has just 
chosen A, or A, , respectively; while (1 — 7,) and (1 — 7a.) are the corre- 
sponding conditional probabilities of a shzft in behavior. 

For specificity, let us consider the case where 7, > 7. . Then the game- 
theoretical objection to regarding as rational the asymptotic behavior 
predicted by the Estes model is that the subject could increase his expected 
reward by always choosing A, . For then the expected reward would be 


™ > pim + (1 — pi)m , (5) 


where the terms on the right-hand side of the inequality are easily seen to 
be the expected reward for the Estes model. 

But the rationality of this game-theoretical solution is compelling only 
under the assumption that the reward probabilities are known to the subject, 
and known to be constant. These are the assumptions that Flood challenges. 
Let us consider an alternative set of assumptions which, while not the only 
possible such set, has some plausibility. 

(i) The subject takes as given and fixed the z corresponding to the 
alternative he has chosen on the last trial. That is, he assumes the probability 
of reward to be 7, or 7 , if he persists in choosing again A, or A, , as the 
case may be. 

(iz) The subject expects that if he shifts from the alternative just 
chosen to the other one, the probability of reward is unknown and dependent 
on a strategy of nature. 

(iz) The subject does not wish to persist in his present behavior if there 
is a good chance of increased reward from shifting. He measures his success 
on each trial not from the reward received, but from the difference between 
the reward actually received and the reward that could have been attained 
if he had outguessed nature. In the terminology of L. J. Savage, he wishes 
to minimize his regret. 

We may formalize these assumptions as follows: On each trial, the 
subject chooses between (7) persisting or (iz) shifting his choice. If he persists, 
he is rewarded with probability + (where = 7, , or t = 72 depending on 
whether the previous choice was A, or A, , respectively), irrespective of the 
strategy adopted by nature. If he shifts, he will receive a reward of 0 if 
nature adopts her strategy (a), and a reward of 1 if nature adopts her strategy 
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(8). The payoff matrix corresponding to these assumptions is: 











(a) (8) 
(z) | rg T 
(72) | 0 1 





where rows correspond to the subject’s strategies and columns to nature’s 
strategies. Regret is defined as the difference between the actual payoff for 
a given pair of strategies [e.g. (7), (8)], and the payoff that could have been 
realized, if the strategy actually employed by nature had been anticipated 
[e.g., (72), (8)]. Performing the indicated subtractions, the regret matrix is: 


(a) (8) 
l ] 
(2) 0 | ir me Hp 








(72) | —T 0 


(This was obtained from the first matrix by subtracting from each element 


the largest element in the same column). 
Now let p be the probability that the subject uses strategy (2), i.e., 


persists, » be the probability that nature uses strategy (a). Then the expected 
regret will be 

R = pu-04+ p(l — we — 1) + (1 — p)u(—z) + (1 — p)(l — p)-0 (6) 
Al — pr — 1) — (1 — per. 


The conditions that the regret be minimum (strictly, minimax) are 


given by 
ao 0 
Using the second of these equations, we obtain from (6) 
0 = —p(x — 1) — (1 — pe, (8) 
whence 


p=. (9) 


Hence the subject would persist with probability 7 and shift with 
probability (1 — 7). But this is precisely the postulate contained in (2). 
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Hence, we have shown that the behavior predicted by Estes’ theory is identical 
with that which would be exhibited by a rational subject intent on minimazing 
his regret. 


Comments on the Derivation 


We need not try to decide whether the subjects who behave in conformity 
with the predictions of Estes’ theory are minimaxing regret, or whether they 
are simply behaving in the adaptive fashion implied by the usual learning 
mechanisms. Most economists and statisticians would be tempted to accept 
the former interpretation, most psychologists the latter. It is not immediately 
obvious what source, other than introspection, would provide evidence for 
deciding the issue. 

Perhaps the most useful lesson to be learned from the derivation is the 
necessity for careful distinctions between subjective rationality (i.e., behavior 
that is rational, given the perceptual and evaluational premises of the subject), 
and objective rationality (behavior that is rational as viewed by the experi- 
menter). Because this distinction has seldom been made explicitly by 
economists and statisticians in their formulations of the problem of rational 
choice, considerable caution must be exercised in employing those formu- 
lations in the explanation of observed behavior. 

To the experimenter who knows that the rewards attached to the two 
behaviors A, and A, are random, with constant probabilities, it appears 
unreasonable that the subject should not learn to behave in such a way as 
to maximize this expected gain—always to choose A, . To the subject, who 
perceives the situation as one in which the probabilities may change, and who 
is more intent on outwitting the experimenter (or nature) than on maximizing 
expected gain, rationality is something quite different. If rationality is to 
have any meaning independent of the perceptions of the subject we must 
distinguish between the rationality of the perceptions themselves (i.e., 
whether or not the situation as perceived is the real situation) and the 
rationality of the choice, given the perceptions. 

If we accept the proposition that organismic behavior may be sub- 
jectively rational but is unlikely to be objectively rational in a complex 
world then the postulate of rationality loses much of its power for predicting 
behavior. To predict how economic man will behave we need to know not 
only that he is rational, but also how he perceives the world—what alternatives 
he sees, and what consequences he attaches to them (5). We should not 
jump to the conclusion, however, that we can therefore get along without 
the concept of rationality. While the Estes model predicts the behavior of 
naive subjects under partial reinforcement, we observe (3) that persons 
trained in game theory and placed in the same situation generally learn to 
choose A, consistently. It appears simpler to postulate here a change in set— 
a change in the perceptual model—rather than to attempt to explain this 
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behavior in terms of simpler learning-theoretic models. If anything was 
learned during the series of trials by the subjects who were game theorists, 
it was the appropriate perceptual model and not the appropriate behavior 
once that model is assumed. 


5. 
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At least four approaches have been used to estimate communalities 
that will leave an observed correlation matrix R Gramian and with minimum 
rank. It has long been known that the square of the observed multiple- 
correlation coefficient is a lower bound to any communality of a variable of 
R. This lower bound actually provides a “best possible’ estimate in several 
senses. Furthermore, under certain conditions basic to the Spearman- 
Thurstone common-factor theory, the bound must equal the communality 
in the limit as the number of observed variables increases. Otherwise, this 
type of theory cannot hold for R. 


I. Introduction 


One of the intriguing problems of factor analysis has been to find a 
formula for communalities that will minimize the rank of an arbitrary 
correlation matrix R. More explicitly, the problem is to find a diagonal 
matrix U such that R — U’ is Gramian and of minimum rank. 

Let n denote the order of R (and of U), and m the minimum rank for 
Gramian R — U’. At least four approaches have been used to estimate 
communalities that will yield m: 


(a) trial-and-error exact formulas 

(b) exact formulas for special cases of R 
(c) successive approximations 

(d) lower bounds. 


The main thesis of this paper is that, in certain senses, the last-mentioned 
of these four approaches provides “best possible” estimates of communalities 
for an arbitrary R, even though biased in general by being underestimates. 

Let u; be the jth diagonal element of any U that leaves R — U* Gramian 
(whether or not with minimum rank), and let h; be the corresponding com- 
munality: 


hj =1—1u; (j = 1,2, +--+ ,n). (1) 

Let p; denote the multiple correlation coefficient of the jth variable in R 
on the remaining n — 1 variables, and o; the corresponding standard error 
*This research was facilitated by a grant from the Lucius N. Littauer Foundation 


to the American Committee for Social Research in Israel in order to promote methodologi- 
cal work of the Israel Institute of Applied Social Research. 
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of estimate (assuming all observed variables to have unit variances): 


p;) = 1— 4; (j = 1,2, +--+ ,n). (2) 
Then it has been shown (2, 92f; 3, 293) that always 
p; Sh; (j = 1,2, --- ,n). (3) 


No better general lower bound to h; has yet been established than p;. 

We shall prove here that there exist many nonsingular matrices R for 
which the equality in (3) holds for n — m of the minimizing communalities— 
all but m of the p; are actual communalities (the remaining communalities 
equal unity). Such matrices R, however, are of a very restricted type. 

A more generally useful result that we shall establish applies to the 
typical R postulated in the Spearman-Thurstone theory. This school of 
thought believes a common-factor analysis is meaningful only if m is small 
compared with n. We shall prove that if the ratio of m to n tends to zero 
as n — o, then all except possibly a zero proportion of the p; must tend to 
the rank-minimizing h; . If the Spearman-Thurstone hypothesis is correct for 
a given R, then the p; must almost always be very good approximations to the 
hi when n is large. (Conversely, if the approximation is bad for many p; , then 
the Spearman-Thurstone hypothesis of a limited number of common factors 
must be false.) ; 

An even more general result refers to all R, regardless of the ratio of 
m to n. If there is to be one and only one unique-factor variable that can 
yield the uniqueness u; , then it must be that the limit of o; must be wu; as 
n — o (or it must be that p; — h’). Conversely, if o; does not tend to uj 
as n — o, then there is more than one “unique” variable that can provide 
the same loading u; (and satisfy all other algebraic requirements of common- 
factor theory); the larger the difference between o; and uj (or between p; 
and h;), the larger the possible difference between alternative ‘“unique” 
parts for the same jth observed variable of R. 

Other important properties of the lower bounds p; will be established. 
Before going on to our new results, it may be helpful to review briefly the 
four approaches listed above. 


(a) Trial and Error 


Assuming that sampling error and rounding-off errors in computations 
are nonexistent, trial and error is bound to yield an exact numerical answer 
when m < n/2; the diagonal elements of U’ in such cases are rational functions 
of the non-diagonal elements of R (cf. 8). It may turn out, of course, that U’ 
is not uniquely determined; two or more different U’ for the same R may 
yield m in many cases. When m = n/2, trial and error can lead again to an 
expression for each communality, although in non-rational form in general. 
Again, multiple solutions for minimizing U’ may occur. 
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(b) Special Exact Formulas 


Some special cases of R make possible exact and rational formulas that 
need no apparent resort to trial and error. The known cases are for m < n/2, 
the most celebrated being Spearman’s where m = 1. Thurstone has summa- 
rized a number of such formulas (8, ch. XIII). A caution should be added to 
Thurstone’s discussion to the effect that not all the apparent solutions may 
yield Gramian U’ nor leave R — U’ Gramian. Actually these formulas beg 
the question, for it is generally not known in advance whether or not m < n/2. 
A specialized formula in effect must be tried on the given R to see if it works. 
Use of specialized formulas thus seems to be but a modified type of trial and 
error. 


(c) Successive Approximations 


Attempts have been made to avoid a direct exact solution for U* by 
taking recourse instead to successive approximations. An approximation 
Uj is guessed, and R — Uj is “factored” until residuals are considered small 
enough, leading to a second approximation U3 , etc. It has been claimed that 
such a procedure generally converges to a satisfactory U” (cf. 8, p. 295). 
Algebraic proof of such convergence has never been published to our knowl- 
edge. For many iterative processes, the value to which convergence takes 
place depends on the initial trial value. That this may be the case for the 
above procedure seems evident when one recalls that there are many corre- 
lation matrices which do not have a unique set of communalities. Also, 
unless proof is given to the contrary, there is no reason to believe that suc- 
cessive approximations may not converge to some U where R — U’” is not 
of minimum rank, if convergence takes place at all. 

The issue of successive approximations is further beclouded by sampling 
considerations. Lawley’s maximum likelihood solution seems the most 
appropriate put forward to date, as Rao points out (7). To attain precision 
in the sampling theory, apparently some restrictions have been introduced 
as to the nature of the population R, else the possibility of equally minimizing 
alternative solutions would remain. Again, it is not clear when a given R 
obeys these restrictions or when the sampling theory is valid in practice. 
[After the above was written, the writer received a copy of reference (1) in 
which a numerical example is given of the failure of Lawley’s iterative pro- 
cedure to converge properly.] 


(d) Lower Bounds 


If we again ignore sampling and rounding-off errors, it is always possible 
to establish useful lower bounds to communalities without any trial and 
error and without any hypothesis about or restrictions on R. The best of the 
lower bounds thus far established are the p; , according to inequality (3) 
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above. It is often more convenient to discuss uniqueness rather than com- 
munalities, or to use inequality (4) rather than (3): 


An important feature of the bounds in (3) and (4) is that they hold 
whether or not there is a multiple solution for U’; they hold for all possible 
solutions simultaneously. Indeed, they lead to a criterion for choosing among 
multiple solutions, as indicated in the next section. 


II. Relationship to the Determinacy of Unique-Factor Scores 


Let r; denote the multiple-correlation coefficient on the n observed 
variables of a unique-factor variable hypothesized to yield the uniqueness 
u; . It has been shown in (6) that 

. ae : 

a oe (j = 1,2, --+ ,n). (5) 
Since (5) holds for all solutions U’, it suggests that when a choice is necessary 
that which makes the inequalities (4) as small as possible is most desirable; 
the denominator on the right of (5) is fixed for j, so that such a choice 
makes the individual scores on the unique factor as determinate as possible 
from the observed data, or the rj as close as possible to unity. It has been 
shown that this also often tends to make individual scores on the common- 
factor variables as determinate as possible (6). 

Should the approximations (4) for U’ turn out not to be close in a given 
case, then the factor analysis itself may be regarded as not very useful or 
definitive. For it has been shown in (6) that determining factor loadings 
alone—common and unique—can be far from sufficient for pinning down 
scores on the hypothesized factors. Alternative sets of scores for a given hypo- 
thesized factor can exist which yield identical loadings and yet correlate negligibly 
with each other, according to formula (6), 

rt = 2r; — 1 (j = 1,2, +--+ ,n), (6) 
where 7; is given by (5) and r* is the minimal correlation always attainable 
between two alternative sets of scores for the same unique factor hypo- 
thesized to underlie ; . [According to (6), if r; = .5, then r* = 0, or alternative 
score solutions for the same jth unique factor always exist that correlate 
zero with each other. Even if r; is as large as .9, this raises r* only to .8. An 
equation parallel to (6) holds for common factors.] 


Ill. The ‘‘Best Possible’ Estimates 


Can inequality (4) be improved on without recourse to some form 
of trial and error or use of specialized hypotheses? This does not seem 
possible. According to (5) this would imply some advance information on 
the r; ; there is no apparent way of getting such information on the 7; in a 
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universally systematic manner. The situation seems to be the reverse: rj is 
determined by uj rather than conversely. 

The rest of this paper will be devoted largely to showing that (4) is 
actually a ‘“‘best possible’ inequality in the sense that the phrase “‘best 
possible’ is usually used mathematically for inequalities. The essential 
characteristics are that (a) many correlation matrices FR exist for which the 
equality in (4) is actually attained at the same time that minimum rank m 
is attained, and (b) the inequalities in (4) must tend to equalities as n increases, 
under certain general conditions important to the theory of common-factor 
analysis. The bounds improve systematically in general as n increases, or as 
there is more information available from more observed variables. Further- 
more, inequality (4) leads to inequalities for m that are also “best possible,” 
and is closely related to the problem of estimating individual scores on the 
unique factors without any rank assumptions, via image analysis. 

In virtually all attempts to solve the communality problem—whether 
exactly or by successive approximations—the problem is stated as for a 
fixed and finite n, or where R is from a finite number of n observed variables. 
It seems appropriate to ask also what happens to communalities as n in- 
creases or decreases. 

While this issue is not discussed very explicitly by most writers, it 
usually seems implied that if the additional variables retain the same general 
kind of content as the initial ones, communalities of the initial ones should 
remain constant for all n sufficiently large. This would imply that for n 
small enough we should generally have m > n/2, or easy exact computations 
for U* (even ignoring sampling error) should be the exception rather than the 
rule. Having m > n/2 for relatively small m does not preclude m from re- 
maining constant—and hence becoming relatively small—as n increases. 
It does imply that multiple solutions should be quite prevalent for finite n 
in practice. Furthermore, it cautions that an apparently exact solution for 
finite n may be but an artifact due to the finiteness of the number of variables 
observed. 

It would be desirable, in view of all the preceding considerations, to 
have a systematic way of getting information about communalities with no 
assumptions whatsoever about R, yet without resorting to trial and error. 
Furthermore, this information should remain valid as n increases. 

One of the virtues of the bounds (3) and (4) is that they possess these 
qualities in a simple and direct manner. This seems to be another type of 
“best possible” property from that usually considered, and one which appears 
peculiarly relevant to the problem of factor analysis. 


IV. Attaining Equality When n Is Finite 


If o; = 0 for some j (so that p? = 1), then it must be that u? = 0 from 
(4) and the fact that a uniqueness cannot be negative. Here is one kind of 
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special circumstance wherein our bound becomes an exact estimate even 
when n is finite. In practice, this is not to be expected, since having one 
observed variable perfectly predictable from all the rest makes RF singular. 
Many cases of nonsingular R also exist for which the equality in (4) 
holds and n is finite. We shall exhibit some now. To this end, let us first 
recall that the oj are the reciprocals of the corresponding main diagonals 
of R~*. The following notation will be useful here and also later. Let S~? 
(the inverse of S’) be the diagonal matrix with the same main diagonal 
elements as R~*. Then the jth main diagonal element of S’ itself is simply 


o; (j = 1,2, --- , n): 


S’ = [ot 02, +++ , on]. (7) 
If R is nonsingular, there exists a nonsingular matrix F such that 
R = FF’. (8) 


F can be chosen in infinitely many ways for (8) when n = 2, but always we 
can rearrange variables to find an F of the form 


A 0 |, @) 
B C| 


where A is a nonsingular square submatrix of order m, B is of order 
(n — m) X m, and C is nonsingular and of order n — m. From (8) and (9), 


ie te , 
R= | AA AB (10) 
BA’ BB’+ CC’ 


It is easily verified that the inverse of F is given by 


F= 





























os : = 0 | (11) 
1~C Ba € 
From (8), R™' = (F~’)’F™, or using (11) 
rr (12) 
es @e) 
where 
G = (AA’)"' + (A™”’)/B’(CC’) "BA" (13) 
and 
H = —(CC’)"BA™. (14) 


Now consider the special case where CC’ is a diagonal matrix. Then 
(CC’)~* is also diagonal. According to (12) and (7), (CC’)~* constitutes the 
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lower right-hand submatrix of S~’, or CC’ constitutes the corresponding 
submatrix of S’ and defines the oj for7j = m + 1,m + 2, --- ,n. If we subtract 
this submatrix CC’ from the lower right-hand corner of FR in (10), we are 
clearly left with a reduced R that is Gramian and of rank m, it being the 
product of [A B]’ and its transpose. Thus we have: 


Theorem 1. If R can be factored into an F of the form (9) where CC’ is 
diagonal (and A and C are nonsingular), then the main diagonal elements of 
CC’ are the respective o; forj =m+1,m+2,---,n. If thesen — ma; in 
CC’ are subtracted from the corresponding main diagonal elements of R, the 
resulting matrix will be Gramian and of rank m. 


According to Theorem 1, when m is the actual minimal rank possible 
for Gramian R — U’, then the first m diagonal elements of U* can be set 
equal to zero, and the last n — m diagonal elements equal to the corresponding 
a; as given by CC’. Thus, the last » — m of the o; serve exactly as rank- 
minimizing uniquenesses, or the equality in (4) holds for 7 = m + 1, m + 2, 

75%. 

Notice that the first m uniquenesses implied by Theorem 1 are zero and 
not equal to the o;. If the first m o; were also subtracted out from the main 
diagonal of R, then the resulting R — S’ would in general not be Gramian, 
nor of minimum rank (cf. 4). 

Theorem 1 holds even when the m in it is not minimal. It is always 
possible to use the Theorem for the case where C is of order 1, and hence 
CC’ is necessarily a diagonal matrix. This provides: 


Corollary. For any nonsingular R, if any one o; is subtracted from the 
corresponding main diagonal element of R, then the resulting matrix is of rank 
nm — 1. 


This result was partly indicated by Thurstone in his discussion of the 
“diagonal” method of matrix factoring (better known to mathematicians 
as the Schmidt or Gram-Schmidt process of orthogonalization), but without 
noticing apparently that his implied uniqueness was exactly o; (8, p. 308). 

We have thus completed showing that there are many matrices for which 
many of the o; can serve as rank-minimizing uniquenesses. Also, we have 
the curious result that any one of the o; alone will reduce nonsingular R to 
a Gramian matrix of rank n — 1. 


V. Equality in the Limit asn > © 


We have already seen in Part III how, if a “unique’’-factor variable is 
really to be uniquely determined for a given u; , then we must have uj/o; > 1 
as n — ©, according to (5) and (6). This conclusion does not depend on the 
size of m, nor in particular on whether m remains finite or becomes infinite 
as n — ©. It thus applies to ordered factor theories—such as the radex, 
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with its simplexes and circumplexes (5)—as well as to limited common-factor 
theories like those of Spearman and Thurstone, whenever the 6-law of 
deviation (5, p. 308) holds for the unique-factor variables. 

Thus, a general sufficient condition for of to tend to u; when uj > 0 is 
that r; — 1 or r* > 1 asn— ~, This holds for each j separately. 

A less general sufficient condition, and one that does not necessarily 
hold for any one j but only for “almost all” 7, is given in 


Theorem 2. If R is nonsingular for all n, and if lim,.. m/n = 0, then 
, 1 eae 
lim — —_ = 1. 
ont » 0; 
For all except possibly a zero proportion of the j it must be that lim,.. Uj/o; = 1. 


The condition that m/n — 0 holds in particular for the Spearman- 
Thurstone approach to factor analysis, which postulates that the number of 
common factors should be small compared to the number of observed 
variables. 

Since ui/c; < 1 for all j, according to (4), we must have the mean 
ratio also bounded above by unity: 

n 2 
aa Sh as) 

The hypothesis that R is nonsingular for all n ensures that no observed 
variable is perfectly predictable from the rest, or that o; > 0 for all j and n, 
so that division by oj in (15) is always justified. The first conclusion of 
Theorem 2 is that the limit of the left member of (15) as n — © is actually 
the right member. But clearly, the mean value of a sequence cannot tend 
to an upper bound to each member of the sequence unless almost all members 
of the sequence also tend to this upper bound. Hence the second conclu- 
sion of Theorem 2 follows from the first. We need only to establish the first 
part of the theorem now. 

As is well known, if R — U’ is Gramian and of rank m, we can write 


R= AA’+U’, (16) 


where A is some matrix of order n X m and of rank m. Let Q be defined as 
the symmetric matrix of order m: 


Q=I1,+ A'U"A, (17) 
where J,, is the unit matrix of order m. It has been shown in (2, 92) that 
Q is Gramian and nonsingular, and furthermore 

RO? = U*— U°AQ*A’U™. (18) 
It is easily verified further, from (18) and (17), that 
A'R"A =I, — Q™. (19) 
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Since the left member of (19) is clearly Gramian, «> must the right member 
be. Indeed, it is known that J,, — Q~* is the covariance matrix of the predicted 
values (from the observed n variables) of any m orthogonal common-factor 
scores underlying loading matrix A (6). Let q‘* denote the kth main diagonal 
element of Q~’, or the variance of estimate of the kth common factor, and let 
pi, be defined as 


p=1-—q" (k = 1,2, +++, m). (20) 
Then p; is the square of the multiple-correlation coefficient of the kth common 
factor from the n observed variables, and 


Therefore, the trace—or sum of the main diagonal elements—of J,, — Q™* 
satisfies 


(I. — 07) = Sipis m. (22) 


We are particularly interested in the trace of U’R™'’, for clearly—re- 
membering (7)— 


n 


tr (U’R™) = tr (U?S”) = >> 


j=1 


(23) 


2 

U; 

a 
i 


Since the trace of a product is unchanged if order of multiplication is reversed, 
tr (A’R™A) = tr (AA’R™") = tr (J, — U’R™), (24) 


the last member following from the middle member by recalling (16). There- 
fore, taking traces of both members of (19) and using (23), (24), and (22), 


n 2 m 
Daa n- Vpzn—m. (25) 


i=1 9; 


Dividing (25) through by n and prefixing inequality (15), 
1 QU; 

>- > +4=1- 

te, A 6; , 


= m 
Dm2zi--- (26) 


sl 


Clearly, if m/n — 0 in the last member of (26), the middle members must 
tend to unity, or Theorem 2 is established. 

Notice that Theorem 2 could be rephrased to say that almost all r; > 1, 
or almost all unique-factors must be determinate in the limit. It is interesting 
to see this in a slightly different way. From (5) and the middle members of 
(26), 


l< l< 
5 et tS gem) (27) 
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or 
2 i 
"4 = i, (28) 


where r’ is the mean predictability of the n unique-factors, while p is the 
mean predictability of the m common factors. When m/n is small, the average 
predictability of the common factors cannot influence greatly the average 
predictability of the unique factors: r? must be close to unity. A further 
consequence is that, if both r’ and p’ tend to unity as n > ~, it must be that 
m/n — 0. This does not require m to remain finite, of course, but only to 
increase at a less rapid pace than does n. 


VI. Increase in Information with n 


A desirable property of estimates of communalities is that they should 
improve in general as n increases. Any n variables studied empirically by a 
factor analysis are usually regarded as but a sample of a far larger universe 
of variables. The communalities sought are those of the universe. 

Of the four approaches to estimates outlined in Part I above, the only 
one which has its estimates vary explicitly with n is that of lower bounds. 
In this sense, it is the only one not tied to algebraic artifacts that may arise 
in data due to the finiteness of n of the observed sample of variables (cf. 
3 and 4). 

For fixed j, p; must increase with n—or at worst remain constant—for a 
multiple-correlation cannot become worse as the number of predictors 
increases. If h; is defined as for the universe of variables (n = ©), then p; 
must improve in general as an estimate of h; as n increases, considering (3). 
The lower bounds improve as estimates as n increases, taking advantage of 
the increased information. 

Similarly, if the jth unique-factor scores are defined uniquely as for the 
universe of observed variables, r; must in general increase with n. From (5), 
this again makes o; an increasingly better estimate of the fixed uj as n in- 
creases. 

Thus, the lower bounds automatically take advantage of whatever 
new information is brought in with increased n, without making any as- 
sumptions at all. In broad classes of cases, as we have seen, this new infor- 
mation can make p; — hj for all or almost all 7. 


VII. Further ‘Best Possible’ Inequalities 


We have concentrated until now on the approximation of the o; to the 
u; . Related to this is another problem: the estimation of minimum rank m 
for Gramian R — U’. We shall show that using the diagonal matrix S? of 
(7) as an estimate of U’ for finite n leads also to a “best possible” inequality 
for m, as well as to other important inequalities. 
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With any nonsingular correlation matrix FR is associated another non- 
singular correlation matrix R* defined by 


R* = SRS. (29) 


R* is clearly Gramian, for R™* is Gramian and S is a diagonal matrix. The 
main diagonal elements of R* are all unity from the definition of S and the 
fact that 1/07 is the jth diagonal element of R~*. Indeed, R* is the correlation 
matrix of the anti-images of the n variables of R (cf. 3, p. 294f). Regardless 
of the statistical meaning of R*, it is a perfectly good correlation matrix when 
n is finite, and we can seek a diagonal Gramian matrix U* that will leave 
R* — U* Gramian and with minimum rank m*. This will lead to the interest- 
ing and important inequality for the case where no o; is a uniqueness nor 
equals unity: 


mt+tmen (ui <o} <1;j = 1,2,--- ,n). (30) 


The restrictions that S’ — U” and J — S’ be nonsingular are essential here 
(consider the counter-example where S = R = R* = J). Thatoj ¥1(I - S* 
be nonsingular) implies that each variable in R has at least one nonzero 
correlation with some other variable. 

According to (30), if m/n is small, then m*/n must be large. Conversely, 
if m*/n is small, m/n must be large. This is rather paradoxical in view of the 
fact that R* can always be reduced to rank m by subtracting out the diagonal 
matrix SUS (= S’U~’). This follows by pre- and post-multiplying (18) 
through by S, remembering (29), and noting that the second term on the 
right is of rank m. Conversely, R can always be reduced to rank m* by sub- 
tracting out S*’U*~’, where S*’ is the diagonal matrix defined by the main 
diagonal of R*~*. Thus, if all diagonal-free submatrices of R have rank less 
than n/2, so must those of R*, and conversely. Regardless, (30) holds. 

In effect, then, (30) implies that to every R for which of ¥ wu; or 1 for 
all 7 and where m < n/2, there corresponds an R* which is a generalized 
“Heywood” case (cf. 4, 159f). Although all diagonal-free matrices have 
small rank in R*, no communalities can be found to make R — U” of equally 
small rank and yet be Gramian. It must be that m* = n — m. This again 
emphasizes that the case m < n/2 may be the exception, rather than the 
rule, for correlation matrices. And it is interesting that this paradox arises 
precisely for those cases where no co; equals the corresponding u} . 

To establish (30), we first recall the theorem (4, 157f) that if S? — U? 
is nonsingular, and if s is the non-negative index of R — S’, then 


sxm_  (|S’-—U’|>0). (31) 
Now, the proof of (31) in (4) can be modified to take care of the case where 
S’ — U’ is possibly singular, to establish the weaker but more universal 


inequality p S m, where p is the positive index of R — S’. We shall not take 
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space to prove this modification here, but shall merely state it in terms of 
our needs for R*: 


p* Ss m*, (32) 


where p* is the positive index of R* — S*’, and we do not necessarily assume 
S* — U* to be nonsingular. 

Now, from (29), R*-* = S~*RS™, or since the main diagonal elements 
of R are all unity, 


IIA 


S* — S*. (33) 


It is interesting to note that (33) and (29) imply that (R*)* = R, or R is 
to R* as R* is to R. 

Statistically, (33) implies that the relative predictability of the jth anti- 
image from the n — 1 remaining anti-images is the same as for the jth original 
variable from the n — 1 remaining original variables. From (29) and (33) 
we can write the identity 


Rn” — S** = S(R™ — DS. (34) 


Sylvester’s “law of inertia” (cf. 4, p. 152) applied to (34) shows that 
p* equals the positive index of R™* — J, which in turn clearly equals the 
number of latent roots of R~' greater than unity. Hence p* equals the number 
of latent roots of R itself which are less than unity. But it has been shown in 
(4) that s is not less than the number of latent roots of R which are greater 
than or equal to unity whenever J — S° is nonsingular. Since R has n latent 
roots all told, it follows that 


st pt2n ((I-—S’|>0). (35) 
Inequality (30) follows from (31), (32), and (35). 


To prove that (30) is a “best possible” inequality, we must show that 
matrices FR exist for which the equality sign holds. It suffices to consider an 
R which has only two distinct latent roots, say 4, > 1 with multiplicity f 
and A, < 1 with multiplicity f* = n — f. Then it must be that 


m= f, m* = f*. (36) 


For m 2 f by inequality (39) of (4, 159), and hence m = f by considering 
that R — d,J is Gramian and of rank f; m* = f* by analogous reasoning 
on R™. Since f + f* = n, (36) provides a special case where the equality 
in (30) holds. 

Inequality (31) by itself is similarly a ‘best possible” one. Consider the 
case where R* has two distinct latent roots, say A, < 1 with multiplicity p* 
and A, < 1 with multiplicity p = n — p*. Since R*™* — I = S''RS™* — 
I = S" (R — S’)S™, p is the positive index of R — S* while p* is that of 
R* — §S*’. Also, since no root vanishes, p = s or the positive and non-negative 
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indices coincide. Since R*™* — \z'J = S-'RS7' — dz'T is Gramian and of 
rank p = s, so must R — \;'S’ be, or the equality in (31) must hold for this 
case. 


VIII. Relation to Image Analysis 


The ratio of ui to o; indicates the relative predictability of the jth 
unique-factor scores from the n observed variables of R, according to (5). 
Closely related is another parameter developed in image theory and denoted 
by 6; , namely, the variance of the difference between the respective scores on 
the jth anti-image and the jth unique factor. It turns out (3, 293) that 6; 
can be computed as the simple difference 


s=a0-—u (j = 1,2, °++ ,n). (37) 


Hence, a necessary and sufficient condition that oj — uj as n > © is that 
5; — 0. This implies that the unique-factor scores must be essentially the 
total anti-image scores from the universe of content. Here we have the 
individual anti-images themselves as increasingly better estimates of the 
unique-factor scores as n > o. This problem of estimating scores is perhaps 
even more basic than that of estimating only over-all parameters, such as 
uniquenesses, which are based on the scores. Estimating U” by S’ has the 
important property of tying in directly with the score estimation problem 
via image analysis. 
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RANK-BISERIAL CORRELATION 


Epwarp E. Cureton 
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A formula is developed for the correlation between a ranking (possibly 
including ties) and a dichotomy, with limits which are always +1. This 
formula is shown to be equivalent both to Kendall’s 7 and Spearman’s p. 


Suppose we have two correlated variables, one represented by a ranking 
(possibly including ties) and the other by a dichotomy. The dichotomy 
may be considered a ranking concentrated into two multiple ties; its ties, 
however, do not represent equal measurements (or judgments of equality) 
on a continuous (or at least a many-step) variable. Rather, the ties represent 
a broad grouping of the data into two categories, or possibly an actual 
two-point distribution (sex, e.g.). Since the number of distinct ranks in the 
ranked variable will always be much greater than 2 and will equal N in the 
untied case, exact rank agreement of the two variables, pair by pair for each 
individual, is impossible. In this situation we desire a coefficient which will 
still have attainable limits +1 in all circumstances. It should be +1 when 
all ranks in the “higher” category of the dichotomy exceed all ranks in the 
“lower” category, and —1 when all ranks in the “lower” category exceed 
all ranks in the “higher” category. It should be strictly non-parametric, i.e., 
defined wholly in terms of inversions and agreements between pairs of rank- 
pairs, without use of such concepts as mean, variance, covariance, or re- 
gression. Finally, it should resemble the usual rank correlation coefficients 
in some reasonable sense. 

Let R, represent the dichotomy, with categories R, + and R, —, and 
let R, represent the ranked variable. Ties in R, are to be handled by the 
mid-rank method. We then arrange the ranks R, in as nearly as possible 
the natural order (VN, N —1, --- , 1), with rank N “high” and rank 1 “low,” 
and allocate them to the categories R, + and R, — as in the following 
example: 
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R,+ R,- Inv. Agr. 
9.5 4 
9.5 4 
8 4 
6.5 3 
6.5 2 
4.5 (1) 
4.5 3 
220 
2.5 
1 








N, = 6 N,=4 Q=2 P= 21 


No two R, ranks may be in the same row, but in case of a tie in R, with 
one member falling under R, + and the other under R, —, the relation 
between the row and column allocations is immaterial. Thus, in (1), the first 
6.5 might as well have been allocated to R, — and the second to R, +. 

With this arrangement, there is an inversion at any given number under 
R, — for every smaller number under R, +. Thus, at 6.5 in R, — we have 
two inversions, one for each of the values 4.5 under R, +. There is also 
an agreement at any given number under R, + for every smaller number 
under R, —. Let Q be the total number of inversions, and let P be the total 
number of agreements. 

With this method of allocation to rows and columns, perfect positive 
correlation would require that all numbers under R, + should be larger 
than all numbers under R, —, and in this case we should find that Q = 0 
and P = P.,,,. . Perfect negative correlation would require that all numbers 
under R, + should be smaller than all numbers under R, —, and in this case 
we should find that P = Oand Q = Quax . Also, Pmax = Qmax , Since the two 
result merely from an interchange of the sets of numbers under R, + and 
R, —. Our coefficient may therefore be of the form 


Tra = (P prs Q)/Pasx ° (2) 


It will be +1 if Q = O and P = P,,,. , —lif P = Oand Q = Quax = Prinz ; 
and 0 if P = Q. 

To determine P,,, , we note first that in the situation in which the 
coefficient is +1, there will be N, agreements for every number under R, +, 
or N,N, in all. There is one case, however, so far passed over, in which P,,, 
cannot be as great as N,N, . This case is illustrated in our example. If we 
set up explicitly the situation for P = P,,,, with these data, we have: 
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R, + R, — Inv. Agr. 
9.5 4 
9.5 4 
8 4 
6.5 4 
6.5 4 
3 3 
4.5 (3) 
4.5 
2.5 
2.5 
1 
N,N2 = 24 
One agreement is lost because the lowest rank under R, + is tied with the 
highest under R, —. In other cases there might be a triple or multiple tie 


at the point of dichotomy. We shall term a tie at this point a bracket tie. 
For any bracket tie, the value of P,,,, will be reduced from N,N, by unity 
for every pair of members of this tie one of which is under R, + and the 
other under R, —, after R, has been rearranged to be as nearly as possible 
in the natural order and allocation under R, + and R, — is made in such a 
manner as to preserve the original values of N, and N, . If ¢, is the number 
under R, + participating in the bracket tie, and ¢, the number under FR, —, 
Prax = NiN2 — tit. , and our formula becomes 
Mle 
i pp NA. = TS (4) 


Physically, it is not necessary to rearrange the original data in order 
to compute #,/, . We merely draw a horizontal line across columns R, + and 
R, — in (1), at a level which leaves N, cases above the line and N, below it. 
Since the original arrangement in (1) was with R, in as nearly as possible 
the natural order, a bracket tie will then consist of any group of identical 
numbers, some immediately above and some immediately below this line. 
The number above is ¢, and the number below is ¢, . For the example of (1), 
we find by (4): 

21 -—2 


ep > (6(4) — (DA) = 826. 


Clearly rez is a Kendall-type coefficient, since Q and P are the numbers 
of unweighted inversions and agreements, respectively (2). But it is also a 
Spearman-type coefficient. Durbin and Stuart (1) have shown that, in the 
untied case, Spearman’s coefficient is given by (U — V)/(U — V) max , Where 
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V is the number of inversions and U the number of agreements, each weighted 
by the difference between the two ranks concerned. It is easily shown that 
the difference which supplies the weight may come from either R, or R, , 
and it is also easy to find (U — V),n.. for the cases corresponding to Kendall’s 
p, and p, . The writer has not been able to prove in these cases that the 
values given by (U — V)/(U — V)max are necessarily equal in general to 
those given by the corresponding formulas based on =d’, but he has verified 
each of them on several sets of numerical data. 

In the present case, we need merely note that all R, values bracketed 
under R, + would have one mid-rank value, and all those bracketed under 
R, — another. If, then, we weight each inversion and agreement by the 
corresponding rank-difference in R, , all weights will be equal (and equal to 
the difference between the two mid-rank values), and it follows at once that 
Tre is a Spearman-type coefficient. 

The hypothesis that rp, differs only by chance from prs = 0 may be 
tested by the Mann-Whitney extension of the Wilcoxon test (3). 
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A NOMOGRAM FOR FACTOR ANALYSTS* 
P. D. DEvEL 


UNIVERSITY OF CALIFORNIA 


When a new reference vector is chosen graphically from the plane of 
two old ones, its direction cosines as well as the projections of the tests on 
it are most easily computed by applying certain multipliers d and Sd to 
quantities which are already known. The nomogram quickly supplies d, 
after S has been read from the graph. 


The nomogram accompanying this article reduces the computing work 
in what is perhaps the most popular of the graphical rotation methods of 
factor analysis, namely, the diagrammatic method explained in Thurstone 
(1, pp. 194-216) with rotations made in one plane at a time. Use of the 
nomogram will be explained in terms of the following figure: 
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As usual, A and B represent trial reference vectors and the dots represent 
the projections of the test vector termini in the plane determined by A and 
B. (Following psychological usage, we refer to the “given” or analyzed 
variables as tests.) The cosine, .383, of the angle @ between A and B has 
been recorded in the lower left-hand portion of the diagram; it was obtained 

*Suggestions by Norman Livson, Thomas Nichols, and Kary] Atherton have been 
incorporated in the nomogram. Mrs. Atherton also checked the necessary computations. 


In addition, I am obligated to Katherine Eardley, scientific illustrator, for her care in 
lettering and inking the original. 
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in verifying the linear independence of A and B. The figure suggests re- 
placement of A by a new trial reference vector A’’, collinear with the dashed- 
line vector A’, for five tests have nearly vanishing projections on A’. From 
the diagram we read the vector equation A’ = A + .647B. The projections 
of all the tests on A’’, as well as the direction cosines of A’’, are required. 

More generally, the method under discussion will always yield a “long 
reference vector” A’ related to known reference vectors A and B by one of 
the equations A’ = A + SBor —A’ = A + SB, for some number S between 
—1 and 1. When A’ = A + SB, the inner product (V, A’) between any 
vector V and the unit vector A” collinear with A’ is 


(V, A”) = d(V, A) + Sa(V, B), (1) 


where d is (A’, A’), the reciprocal of the length of A’. If V is a test vector, 
(1) gives the desired projection (V, A’) of V on A” in terms of the known 
projections (V, A) and (V, B) of V on A and B. Similarly, if V is one of the 
orthogonal basic vectors, (1) gives the desired direction cosine (V, A’’) of 
A” with respect to V in terms of the given direction cosines (V, A) and 
(V, B) of A and B with respect to V. Thus, when d is known, all the required 
quantities are obtained from (1) by applying the multipliers d and Sd to 
columns containing the relevant inner products (V, A) and (V, B). The 
case —A’ = A + SB is accommodated by simply negating the right side 


of (1). 
Upon noting that (A, B) is the cosine of the angle @ between A and B, 


it is seen that 
d = (|S + cos @ |? — | cos @ |? + 1)°”’, (2) 


a graphable function of | S + cos @ | and | cos 6 |, and in fact, the function 
represented in the nomogram. To determine d, therefore, it is only necessary 
to perform the addition S + cos 0, enter the nomogram with arguments 
| S + cos 6 | and | cos @ |, and read d from the appropriate inside scale. To 
read the nomogram: 


1. Locate | S + cos @ | on left-most stem and | cos @ | on right-most stem. 

2. Align a straightedge through the two points thus found. 

3. Pick out appropriate inside scale. This is the scale whose label (at 
top) combines the labels of the scales on which | S + cos @ | and 
| cos 6 | were found. 

4. Read d where straightedge crosses appropriate inside scale. 


Generally, S is read from a graph and cannot be identified with better 
than two-place accuracy. In such cases a gratuitous third digit may be 
appended to S, so chosen as to render the third digit of the sum S + cos 6 
zero. This device makes for greater ease and precision in locating values of 
| S + cos 6 | near the top of the left-hand stem. 
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In our example, the value +.64 + for S was read from the graph, and 
the final digit 7 was selected to complement the third digit 3 of cos 6. The 
addition S + cos 6 = 1.03 was performed directly on the diagram. The 
value 1.03 for | S + cos 6 | was then found on the B-scale of the nomogram; 
the value .383 for | cos @ | was located on the X-scale; hence, d = .723 was 
read from the BX-scale. 

The scale factor of the nomogram varies widely from the lower part 
of the BX-scale to the upper region of the AY-scale. The instrument has 
been designed, however, with the purpose of securing three-place accuracy 
for virtually all cases which arise in practice; extensive applications of the 
nomogram, both to Thurstone’s illustrative material and to original factor 
analyses, indicate that this goal has been well attained. In successive ro- 
tations of the above type, the liberal A X-scale carries by far the most traffic. 

Larger copies of the nomogram may be obtained by writing to the 
author at the Institute of Child Welfare, University of California, Berkeley 
4, California. The copies have stems approximately 93’’ long and are printed 
on 83’’ x 11’’ index-card paper. 
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A NOTE ON 
THE ESTIMATION OF NONSPURIOUS CORRELATIONS 


Wiuuiam H. ANGorr 


EDUCATIONAL TESTING SERVICE 


A method is provided for estimating the nonspurious correlation of 
a part of a test with the total test. Two cases are considered: one in which 
the actual subtest is parallel to the total test, the other in which the actual 
subtest is not parallel to the total test. 


A problem that frequently arises in the examination of test data is that 
of estimating the degree of relationship of a test with a subsection of items 
drawn from the test. In general two methods are available: one is to correlate 
the subtest with the total parent test; the other is to correlate the subtest 
with the complementary subtest resulting from the subtraction of the first 
subtest from the total. In the first method, it is clear that spuriousness 
exists; even if the subtest is totally unreliable and correlates zero with the 
complementary subtest, the computed correlation of the part with the total 
would result in a value greater than zero, and roughly in the proportion 
represented by the subtest in the total. In general the correlation would be 


Tit = (0; + Tinon)/Or5 (1) 


where f¢ is the total test, and 7 and h are the complementary subsections of 
the total test. Even if r;, = 0, r;, is still greater than zero, merely by virtue 
of the presence of 7 in the total test. 

The second method, on the other hand, defeats its own purpose—it 
yields a correlation of the subsection 7 with the complementary subsection 
h, not a correlation with a test of the length ¢. 

In order to estimate the nonspurious correlation, say r;-, , consider test 
t to be an unspeeded test of power, and, as before, to comprise two parts, 
j and h. Also consider a hypothetical test, 7’, exactly parallel and of equivalent 
effective length (1) to 7. Subtests 7 and h need not be parallel forms. Then 


Tye = (i095 +1) n0n)/o, - (2) 


The notation in (2) may be modified slightly. Sinee j and j’ are parallel, 
r;-, may be written r;,. Also r;;,, which expresses the reliability of test 7 
as the correlation between parallel forms, may be written in the conventional 
notation as r;; . Henceforth, 7;., will be written as r;, , and r;;- will be written 
as r;; . However, the notation r;., , designating nonspurious correlation, will 
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be retained to distinguish it from the spurious correlation, r;, . With the fore- 
going modifications in notation the formula for nonspurious correlation is 


Tit ~ (1550; + TinOn)/Or ° (3) 


In the actual situation, the reliability, r;; , will probably be estimated best 
by one of the internal consistency reliability formulas appropriate to power 
tests. 

If now a further restriction is placed on test ¢, namely, that 7 and h be 
parallel (but not necessarily of equivalent length), then further simplification 
is possible. It has been observed (1) that under this restriction 


| fia at ’ (4) 


and also 


T4502 = GC; - 
‘.,. = — 5 
tt r;(o; ped 1;19;) ( ) 


Substituting (5) in (4), 


( 7tOr ~— OG; 

ry = ne = 9) (6) 
oO, — 1540; 

It may also be seen that if r;,0, — o; is substituted in (3) for its equivalent, 

r;,0, [see equation (1)], then 


4 (1550; + 540, — gj) ‘oO, . (7) 


Finally, substituting (6) in (7), 


ry = rts = 4; , (8) 
= T5410; 

Two formulas are thus presented for estimating the nonspurious corre- 
lation of a subtest of items with the total test from which it is drawn. In 
(3) and in its equivalent, (7), no restriction is imposed on the kinds of items 
drawn from the total test. These equations could, for example, represent the 
estimated correlation of a subset of arithmetic items drawn from a hetero- 
geneous total test consisting of arithmetic, verbal, and spatial items. Equation 
(8), on the other hand, requires that the subset of items be essentially a 
short parallel form or miniature of the total test. Whereas (3) and (7) require 
a separate determination of the reliability of the subtest by means of internal 
consistency methods, such as the Kuder-Richardson formulas, (8) permits 
that estimate to be made implicitly as an integral part of the estimate of 
nonspurious correlation. It is the added restriction that 7 is parallel to ¢ that 
makes the simplification possible. 

Additional algebraic simplifications may be made in (8). From (1) it 
is seen that the numerator of (8) may be written r;,0, . It may also be seen 
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that the denominator of (8) may be written r,,0, since 
Tre = Tonite = (oe — ieGi)/on (9) 


Thus (8) may be written 


Tyee = 7 jnon/Trron = Tirn/Tre (10) 
Also, from (6) and (8) 
tie = 735/Tic ; (11) 
and finally, from (5) and (8) 
Tyree =U eNit » (12) 
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THE VARIANCE OF THE NUMBER OF MUTUAL CHOICES IN 
SOCIOMETRY 


Leo Katz 
MICHIGAN STATE UNIVERSITY 


AND 


THuRLOw R. WILSON 
UNIVERSITY OF NEW MEXICO* 


The variance of the number of mutual dyads in a sociometric situation 
where each member of a group chooses independently and at random is 
derived for unrestricted numbers of choices per group member, as well as 
for a fixed number of choices. The distribution of the number of mutuals is 


considered. 


I. Introduction 


In a sociometric test, a number of the pairs of group members (dyads) 
may show mutuality. A mutual dyad is defined by the two subjects selecting 
one another. When each member of the group makes a fixed number of 
choices, the number of mutual dyads has been assumed to have a binomial 
distribution with p, the probability of a success, equal to the probability 
that a given dyad is mutual, and n, the number of binomial observations, 
equal to the number of dyads (1, 7). Several writers have pointed out that 
since sampling is without replacement, this is not correct (2, 4, 6). Let M 
represent the number of mutual dyads for a group of N numbers. If each 
member independently makes d selections at random from the N — 1 other 
members, the probability that a given dyad is mutual is d’/(N — 1)’, and 
the number of dyads is N(N — 1)/2. The expected value of M is not affected 
by the non-replacement in sampling and is given by 








Nd’ 
E(M) = np = XV — 1) (1) 
We shall show that for a fixed number of choices 
d 2 
Var (M) = H(t _ a) (2) 
rather than the variance appropriate to the binomial distribution, 
Var (binomial) = npq = wa) 1 - (= £ -) | (3) 


*We are indebted to Robert Bush and Hartley Rodgers of Harvard University for 
helpful criticisms. 


299 








300 PSYCHOMETRIKA 


The ratio of the binomial estimate to the variance of (2) is (VN — 1+ d)/(N — 
1 — d); the binomial formula will appreciably overestimate the variance of 


M if the size of the group is small. 
When the members of the group do not make the same number of 


choices, it has been assumed that one may approximate the variance of M 
by using the average number of choices, d, in the formula for fixed d. We shall 
develop the expression for the variance of M for the general case of unrestricted 
numbers of choices and examine the approximation with d. Finally we con- 
sider the distribution of the number of mutual dyads. 


II. Variance with Fixed Number of Choices 


We first derive the variance of M for the case where each member 
makes the same number of choices, d. Define a random variable X,; for the 
particular dyad composed of individuals 7 and j so that X;; is 1 or 0 according 
as dyad 2) is or is not mutual. X,,; is a binomial variable; thus 


—— d’ ( d’ ) 
Var (X;;) = (iN — D? i- iN — 1)" (4) 
Since 
M —_ Xi; , (5) 
where the sum is taken over all distinct dyads, we have 
Var (M) = >) Var (X;;) + 2 D) Cov (X,;Xu1). (6) 


The covariances are summed for all distinct pairs of dyads. 

If two dyads are composed of four different individuals, the occurrence 
of mutual choice for the first dyad is independent of the occurrence of mutual 
choice for the second; hence the covariance of such a pair is zero. A pair of 
dyads may not have two members in common, but a pair may have one 
person in common. There are N(N — 1)(N — 2)/2 such pairs. The expected 
value of the product X;,;-X;,, is just the probability that both are equal to 1: 


a an d*(d — 1) 
E(X 45° X i) aia (N = 1)*°(N ee 2) ’ (7) 





where dyads 7j and ik have one member in common. We find the covariance 
for an overlapping pair of dyads: 


; eee. Lo ko ey 
Cov (X55 X ix) = (N res 1)(N ane 2) (8) 





Substituting the values from (4) and (8) in (6) and simplifying we obtain 
the variance of mutual choices given by (2). 
III. Variance with Unrestricted Numbers of Choices 


We now derive the variance for the case where the members make 
unrestricted numbers of choices. Let d; denote the number of choices actually 
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made by subject 7. The probability that dyad 77 is mutual is given by 





d; d; 
Dis = Pr(Xi; = ) = iv — D? (9) 
The probability that both dyad 77 and kl are mutual is 
d;d;d,d 
Pie Pr (Xi5° Xan = 1) = (VN — Dt ’ (10a) 
if all four individuals are different, while 
Piicxk = Pr (Xi;-Xu% = 1) eae AS (10b) 


~ (N — 1)°(N — 2)’ 


if the two dyads consist of three different members. 
Conventional methods for obtaining the variance (5, pp. 60 ff.) involve 
computing two sums, 


S.= Divs, (11a) 
S, = > Dit.n ) (11b) 


where both sums are evaluated over all distinct sets of subscripts. Considering 
(5), (9), and (11a) we see that the mean of the number of mutuals is S, . 
To express Var (JZ) as a function of S, and 8S, we first note that 


Var (X;;) = pis — (p:i)”, (12) 
and 
Cov (Xi;Xa1) = Diiet — DisPar - (13) 
Summing over the variances and covariances yields 
> Var (X;;) = S, - Le (p;:)”, (14) 
and 
>» Cov (X;;Xx1) = 8, - zz PiiPrr - (15) 
Since 
Si = ie (pii)* +2 iL PiiPri 5 (16) 
substitution of (14) and (15) in (6) produces, after simplification, 
Var (M) = 8S, + 28, — Si. (17) 


We turn next to the computation of S, and S, . The value of S, is by 
definition 


1 
S= Gop Lad. (18) 
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It is, of course, necessary to require i < j in order to prevent duplication of 
cases in the sum. The summation appearing in (18) is the second elementary 
sum of the numbers d; , usually written 


a, = >. d,d;. (19) 


t<7 


These are related to the more familiar power sums, 


s, = did, (20) 
by the relations (3) 

= 5 @ - 5) (21a) 
a2 = 5 (Si B.). a 

L os 
a3 = 31 (s; —_ 3808) — 28), (21b) 

1 

a, = rT (st — 68,8; + 38; + 8838, — 6s,). (21c) 


Those shown will suffice for the present computations. In particular, we have 
established from (18), (19), (20), and (21a) that the mean number of mutuals is 


i) <cee SS = SH. (22) 
2(N — 1) 
If all d; equal d we obtain (1) above. 

The value of S, is somewhat more involved. In the first place, each set 
of four different persons may form mutual pairs in three ways. Secondly, 
each set of three, distinct, may have any of the three at the center of the 
chain of two mutual choices. Taking both of these features into account, we 


have 


sS,=>— ; : d,; d; d, d, 
(N — 1) i<j<k<l (23) 


l 
_3 = = f > = 
(W— DW 2) HPA es — Ds 


1<k 
wt 
Making use of (21a) and (21b) we have, after some reduction, 


Ss, = ——_— (st — 68's, + 38; + 85,8, — 68,) 


vom 8(N — 1) (24) 


2(N — 1)(N — 2) (38, — 8, — 28,8, + 28, — 81 + 38,8, — 28). 














LEO KATZ AND THURLOW R. WILSON 303 


Combining this with previous computations, we obtain 





Var (M) = Te (—2sis, + 8; + 48,8; — 3s,) 
a gee i —y (sss — 82 — 2818) + 28, — si + 38:82 — 26) (25) 


1 2 
+ oN — D? (s: — 82). 


When all d; equal d, (25) reduces to (2). 

As examples with unequal numbers of choices, consider two sociometric 
measurements on 10 individuals. In the first case, two subjects choose two 
persons each, five choose three, and three choose four. The power sums are 
31, 101, 343, and 1205 for s, , s. , s; , and s, , respectively. Equations (22) 
and (25) give E(M) equal to 5.31 and Var (M) equal to 2.30. Observe that 
the use of d equal to 3.1 in (1) and (2) would give the very close approxi- 
mations, E(M) equal to 5.34 and Var (M) equal to 2.29. This happy situation 
would not obtain if the numbers of choices were considerably more variable, 
as in the second example; five make a single choice and five make seven 
choices. The power sums are 40, 250, 1720, and 12,010. From (22) and (25) 
E(M) is 8.33 and Var (M) is 2.33. With d of 4, we obtain the values 8.89 
for E(M) and 2.74 for Var (M). 


IV. Distribution of the Number of Mutual Dyads. 

In principle, we could determine the distribution of M7 exactly by con- 
ventional methods (5, p. 64). Denote the maximum possible number of 
mutual dyads for a group by Max. We define S,, as suggested by (lla) and 
(11b) above. Then 


Pr(M = m) = S, - (™ id iiss " . ” iin 


m m 


ree ciog e . 


m 


(26) 


We have used (26) only for the case where each person makes a single choice, 


where 
me we _ ‘ or (2) 
2m/\ 21 2 2 


= ee 2 
im m\(N — 1)” ; (27) 


and Max equals N/2. 
We conjecture that for large groups with roughly equal d, M has an 
Approximately Poisson distribution; from (2) one readily notes that with 
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increasing group size and r held constant, the variance of M approaches 
E(M) and the covariance term approaches zero. 

The senior author is currently engaged in determining higher moments 
for various common fixed d by methods similar to those used to produce the 
variance for the general case. These higher moments will permit determina- 
tion of the distribution of M for small groups. 
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A NOTE ON JENKINS’ 
“IMPROVED METHOD FOR TETRACHORIC r” 


Josuua A. FISHMAN 


COLLEGE ENTRANCE EXAMINATION BOARD 


Some readers who will be delighted to utilize Jenkins’ method and 
tables for estimating the tetrachoric correlation (1) may be puzzled to dis- 
cover that no explicit provision for negative correlations is included. If we 
follow Jenkins’ instruction to “letter the fourfold table so that a is smaller 
than d and ad greater than bc” four possible arrangements may obtain: 
| d b | a a | b d 
- 2—-—- 3- 4 —- 
b | a 


c 





c 
a. 


nr ° 


3 
ae wig 


Of the above four arrangements, the first (which is the one illustrated by 
Jenkins) and the second are indicative of a positive correlation and the third 
and fourth are indicative of a negative correlation. If either of the latter two 
arrangements does obtain, then the final correction [obtained by multiplying 
the base correction by the multiplier, as in step 5, p. 257, (1)] should be 
(algebraically) added to, rather than subtracted from the negative un- 
corrected r to obtain the corrected tetrachoric r, which should, of course, 
be given a negative sign. The important fact to keep in mind is that the 
correction always reduces the absolute size of the uncorrected tetrachoric r. 

A few words also might be in order concerning the location of decimal 
points in Tables 2 (Base Correction) and 3 (Multipliers for Base Correction). 
Whereas in the former table the omitted decimal points consistently belong 
before the first digit of the reported three-digit table entries, in the latter 
table the omitted decimal points belong before the first digit of two-digit 
table entries, between the first and second digit of three-digit table entries, 
and followed by a zero for one-digit table entries. Thus, whereas a table 
entry of 106 should be understood as .106 in Table 2, it should be taken as 
1.06 in Table 3. Furthermore, 90 is .90 (as illustrated by Jenkins, p. 257) 
but 9 is .09 in Table 3. 


1. Jenkins, W. L. An improved method for tetachoric r. Psychometrika, 1955, 20, 253-258. 
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BOOK REVIEW 


Mathematical Models of Human Behavior. Proceedings of a Symposium Sponsored by 
Dunlap and Associates, Inc., and the Commission on Accidental Trauma, Armed 
Forces Epidemiological Board. 1955. vii + 103 pp. 


Those who expect to read this book in its entirety, or nearly so, would do well to 
turn first to Professor Lazarsfeld’s Concluding Remarks, since they provide some degree 
of unification for what is otherwise a rather disjointed collection of reports on several 
diverse lines of investigation. Those who do not expect to read the entire book would do 
well to read at least Professor Lazarsfeld’s Concluding Remarks for a brief but lucid 
statement of the function of models, especially in the behavioral sciences. He distinguishes 
between static and dynamic models, makes passing reference to the predictive function 
of models, and concentrates on their linguistic function. The linguistic function he divides 
into three parts: organizing, analytical, and mediating. Naturally his remarks refer to 
the papers in the Symposium, but they are quite intelligible in themselves. 

The Symposium itself was held in February of 1954, in connection with a study 
being made by Dunlap and Associates, Inc., on the application of mathematical techniques 
to the study of accidents. Since accidents are “partly the result of human behavior,’ and 
since Many experts were already engaged in devising and studying mathematical models 
of human behavior, the proposal was made to invite some of these experts to meet together 
to describe their work and to participate in informal discussions. This publication contains 
the papers, but unfortunately not the discussion. 

Of the ten papers included (not counting the Concluding Remarks, and not counting 
a paper by Lorge and Solomon to be published elsewhere) only one deals with accidents. 
This is one by H. H. Jacobs, who discusses the difficulties in trying to separate the effects 
of contagion from individual differences as to liability. Two of the speakers, Bush and 
Estes, discussed somewhat related stochastic learning models. The other seven papers 
had to do, more or less directly, with utility, or decision making, or both. 

For general background on models, the paper by Coombs and Kao might be classed 
along with Lazarsfeld’s Concluding Remarks. This is actually the first section of a report 
on multidimensional analysis, and one feels suspended in mid-air at the end. As an in- 
ducement to learn more, this paper succeeds very well. At the other extreme, a paper by 
Luce on the formation of coalitions in game theory is largely for the experts. The other 
five papers are more self-contained, and more directly concerned with utility and decision 
as such. 

Professor Lazarsfeld gently chides the speakers for their preoccupation with gambling, 
although, as one of them remarks, the gambling situation provides the most direct and 
realistic contact with the individual’s utility function. Merrill Flood’s little “Group 
Preference Experiment’’ does not make use of a conventional gambling situation, but it 
does afford, to each of the subjects, a possibility, without certainty, of some gain. A collec- 
tion of objects is shown to a group of individuals, and certain broad conditions are laid 
down according to which one of these objects can be had for the group to dispose of. They 
are left to decide which object it shall be and how it shall be disposed of. Disposition might 
be by lot to one of the group, or by auction or sale with proceeds divided, and other possi- 
bilities can be conceived. 

Marschak is concerned with decisions by individuals, pointing out that even when the 
outcome of an act is known with certainty the same individual may make different choices 
at different times. He considers various hypotheses that might be made in formulating a 
model of such inconsistent behavior. Markowitz discusses a hypothesis of Friedman and 
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Savage for explaining insurance and lotteries; Markowitz replaces it by one of his own 
which seems to accord better with well-known facts. Jarvik’s discussion of gambling is 
largely discursive, and Edwards describes a series of experiments on gambling which 
endeavor to arrive at utilities. 

The success of the Symposium as such could be judged best from the discussion which 
is not published. As a publication, it is interesting for showing the diversity of activities 
under way, but unsatisfying just because of the diversity. As an issue of a periodical, this 
little volume would do very well. It lacks the cohesion to stand well by itself. 

The proofreading is rather poor; presumably the spelling “baracentric” on page 22 


is a typographical error. 
A. 8S. HousEHOLDER 








